Antonio Sanchez
d71c30c478
Fix docs build job.
2025-09-29 16:25:14 -07:00
Antonio Sánchez
79488684e1
Extend the range of supported CMake package config versions
...
Modified to be backward-compatible with Eigen 3.4.0, in that the following
will still accept 3.4.1:
```
find_package(Eigen3 3.3)
```
(cherry picked from commit 027dc5bc8d )
2025-09-29 10:47:01 -07:00
Antonio Sanchez
b66188b5df
Run smoketests on small runners.
2025-09-23 13:22:29 -07:00
Antonio Sanchez
5c81034fc1
Run pipeline on merge requests
2025-09-12 14:14:52 -07:00
Antonio Sanchez
c3f9707824
Move more jobs to gitlab runners.
...
(cherry picked from commit 5d4485e767 )
2025-08-29 11:34:35 -07:00
Antonio Sánchez
4fc1cfeda5
Move GPU ci jobs to gitlab-hosted runners.
...
(cherry picked from commit 52f570a409 )
2025-08-28 11:29:30 -07:00
Alexander Vieth
cd7263e7f6
Restore EIGEN_INCLUDE_DIR in CMake again (for 3.4.x)
2025-08-18 19:53:08 +00:00
Alexander Vieth
eb57d4bdf1
Fix compilation with clang and c++03 on ARM
2025-08-17 16:00:53 +00:00
Antonio Sanchez
4a2c4901ce
Update CI configuration from master.
2025-07-07 14:26:40 -07:00
Antonio Sanchez
68f4e58cfa
Don't expire docs pages job
2025-02-24 08:24:04 -08:00
Antonio Sanchez
0e607fd350
Fix c++03 build and tests
2025-02-18 10:41:23 -08:00
C. Antonio Sanchez
13507d1efd
Remove nightly tag deploy on non-default branches
2025-02-17 18:48:09 -08:00
C. Antonio Sanchez
85ffda9539
Fix arm32 packetmath tests
2025-02-17 17:49:08 -08:00
Charles Schlosser
72f77ccb3e
Fix arm32 float division and related bugs
...
(cherry picked from commit 81b48065ea )
2025-02-17 14:24:36 -08:00
Antonio Sanchez
526a6328e2
Default eigen_packet_wrapper constructor.
...
This makes it trivial, allowing use of `memcpy`.
Fixes #2326
(cherry picked from commit cb50730993 )
2025-02-17 13:00:15 -08:00
C. Antonio Sanchez
7b378c2d91
Fix cherry-pick bug for NEON make_packet
2025-02-17 12:59:57 -08:00
Antonio Sánchez
129e003cdf
Disable FP16 arithmetic for arm32.
...
(cherry picked from commit 7465b7651e )
2025-02-17 12:18:51 -08:00
Antonio Sánchez
6161ce5cde
Fix arm builds.
...
(cherry picked from commit 2c8011c2dd )
2025-02-17 12:18:02 -08:00
Antonio Sánchez
be62728876
More NEON packetmath fixes.
...
(cherry picked from commit 384269937f )
2025-02-17 12:16:09 -08:00
Antonio Sánchez
1426855b68
Fix NEON make_packet2f.
...
(cherry picked from commit 2dfbf1b251 )
2025-02-17 12:15:51 -08:00
Antonio Sánchez
b2deb94e4a
Fix MSVC arm build.
...
(cherry picked from commit 0a5392d606 )
2025-02-17 12:15:28 -08:00
Antonio Sánchez
c23abcf25c
Fix arm32 issues.
...
(cherry picked from commit a73970a864 )
2025-02-17 12:11:20 -08:00
Antonio Sánchez
f23b8c0d78
Fix more hard-coded magic bounds.
...
(cherry picked from commit ae5280aa8d )
2025-02-17 07:28:41 -08:00
Antonio Sánchez
d60c3a3341
Slightly adjust error bound for nonlinear tests.
...
(cherry picked from commit 42aa3d17cd )
2025-02-17 07:28:21 -08:00
C. Antonio Sanchez
57c8d7c93f
Fix failing builds and update CI on push.
...
Specifically:
- Fixed ctz on 32-bit arm (where `uint64_t` is `unsigned long long`)
- Fixed build of random_cpp11 snippet when C++11 is disabled
- Updated CI scripts to run windows on push, and added a no-c++11 test
2025-02-17 07:20:12 -08:00
C. Antonio Sanchez
ab92609cad
Add missing ci scripts
2025-02-16 15:06:16 -08:00
C. Antonio Sanchez
551e95a409
Run pipelines on push
2025-02-16 14:50:58 -08:00
C. Antonio Sanchez
2924f58188
Remove deprecated check in meta test
2025-02-16 14:42:15 -08:00
C. Antonio Sanchez
f1922b6dac
Update cmake configuration from master
2025-02-16 14:41:46 -08:00
C. Antonio Sanchez
052d91349a
Split bdcsvd tests
2025-02-16 14:41:42 -08:00
Antonio Sánchez
72e38684c1
Disable deprecated warnings for SVD tests on MSVC.
...
(cherry picked from commit d58e629130 )
2025-02-16 14:41:34 -08:00
Antonio Sánchez
bb1dbb4df6
Disable deprecated warnings in SVD tests.
...
(cherry picked from commit f0b81fefb7 )
2025-02-16 14:41:32 -08:00
C. Antonio Sanchez
0a5abc042e
Copy CI configuration from master
...
(minus loongarch)
2025-02-16 08:00:49 -08:00
C. Antonio Sanchez
42d9cc0b1d
Fix Tensor docs
2025-02-15 22:46:57 -08:00
C. Antonio Sanchez
7312765992
Fix all doxygen warnings.
2025-02-15 21:10:48 -08:00
C. Antonio Sanchez
88cd53774e
Fix altivec and vsx builds
2025-02-15 13:18:14 -08:00
Chip Kerchner
c0378fedd8
Fix non-VSX PowerPC build
...
(cherry picked from commit 9e0afe0f02 )
2025-02-15 10:14:40 -08:00
Chip Kerchner
414f0a1756
Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
...
(cherry picked from commit 4a58f30aa0 )
2025-02-15 09:17:56 -08:00
Sergey Fedorov
3adc78e39c
Altivec fixes for Darwin: do not use unsupported VSX insns
...
(cherry picked from commit 4d05765345 )
2025-02-11 20:55:29 -08:00
Morris Hafner
e67c494cba
Use old syntax for CMake's separate_arguments() to restore compatiblity with old CMake versions.
2024-11-13 17:01:13 +00:00
Morris Hafner
3e7bcf54f7
cherry-pick !1682 Add nvc++ support into 3.4
2024-11-04 17:55:47 +00:00
Antonio Sánchez
9df21dc8b4
Work around VS2015 compile bug.
2024-03-15 18:07:02 +00:00
Antonio Sánchez
157756130a
Restore C++03 compatibility.
2024-03-15 17:55:04 +00:00
Antonio Sanchez
7893285e59
Fix tridiagonalize snippet for 3.4.
...
Fixes #2770 .
2024-03-12 22:04:46 -07:00
Antonio Sanchez
3ee06ec52f
Fix real schur and polynomial solver.
...
For certain inputs, the real schur decomposition might get stuck in a cycle.
Exceptional shifts are supposed to knock us out of that - but previously
they were only ever applied at iteration 10 and 30, which doesn't help if
the cycle starts after cycle 30. Modified to apply a shift every 16 iterations
(for reference, LAPACK seems to do it every 6 iterations).
Also added an assert in polynomial solver to verify that the schur decomposition
was successful.
Fixes #2633 .
2024-02-16 13:11:54 -08:00
Antonio Sánchez
287c801780
Use stableNorm in ComplexEigenSolver.
...
(cherry picked from commit 0f0c76dc29 )
2024-01-30 08:39:35 -08:00
Antonio Sanchez
42b04a08c4
Fix preshear transformation.
...
Fixes #2777 . The `preshear` function seems to have always used an invalid constructor
internally, and has been broken for a while. Fixed the implementation and added a test.
(cherry picked from commit 45da84e21570bf70238cf489ad862b2f09242c5f)
2024-01-29 12:30:06 -08:00
Rasmus Munk Larsen
b86ac5f1e7
Use padd instead of +.
...
(cherry picked from commit bbfc4d54cd )
2024-01-29 10:50:55 -08:00
Rasmus Munk Larsen
380a9483e0
Implement a generic vectorized version of Smith's algorithms for complex division.
...
(cherry picked from commit 9312a5bf5c )
2024-01-26 20:42:52 -08:00
Charles Schlosser
25270e35db
Fix compiler warnings in 3.4
2023-12-21 00:57:21 +00:00
Antonio Sanchez
ebf968b272
Remove c++11 from ctz/clz
2023-12-20 14:18:48 -08:00
Charles Schlosser
bd57b99f44
fix msvc clz
...
(cherry picked from commit 2c4541f735 )
2023-12-14 13:38:18 -08:00
Antonio Sánchez
b8f894947a
Add internal ctz/clz implementation.
...
(cherry picked from commit 75e273afcc )
2023-12-14 13:37:03 -08:00
Antonio Sanchez
4be2870267
Only apply ASM work-around for min/max on GNUC strict.
...
Fixes #2742 .
2023-11-27 10:08:18 -08:00
Charles Schlosser
f7085a1096
replace using with typedef
2023-11-24 19:42:54 +00:00
Charles Schlosser
63291e34bf
Update file GeneralMatrixVector.h
...
(cherry picked from commit 283dec7f25 )
2023-11-24 19:07:33 +00:00
Charles Schlosser
23886fd7db
Gemv microoptimization
...
(cherry picked from commit d1b03fb5c9 )
2023-11-24 19:07:17 +00:00
Charles Schlosser
7c6020e424
Fix -Waggressive-loop-optimizations
...
(cherry picked from commit 4e9e493b4a )
2023-11-24 19:06:40 +00:00
arthurfeeney
2e3f1d8044
Fix implicit conversion warning in GEBP kernel's packing
...
(cherry picked from commit 937c3d73cb )
2023-11-18 18:17:21 +00:00
Silvio Traversaro
fc5575264f
Backport "disambiguate overloads for empty index list" to 3.4 branch
2023-11-10 04:03:11 +00:00
Antonio Sanchez
bae907b8f6
Update version to 3.4.1
...
Tests all pass: https://gitlab.com/libeigen/eigen_ci_cross_testing/-/pipelines/1060764169
2023-11-06 13:53:54 -08:00
Charles Schlosser
cf207eacd5
Patch SparseLU
...
(cherry picked from commit a8bab0d8ae )
2023-11-02 21:17:17 -07:00
Chip Kerchner
e734787bb7
Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt.
...
(cherry picked from commit 4a58f30aa0 )
2023-10-25 15:19:37 -07:00
Antonio Sanchez
1217390db4
Fix windows+CUDA builds
2023-10-25 20:55:59 +00:00
Antonio Sanchez
7176ae1623
Make 3.4.1 compatible with c++03
2023-10-16 15:38:25 -07:00
Antonio Sánchez
0db5928f00
Eliminate use of _res.
...
(cherry picked from commit 5bdf58b8df )
2023-10-16 13:38:17 -07:00
Erik Schultheis
764b132a79
ensure that eigen::internal::size is not found by ADL, rename to ssize and...
...
(cherry picked from commit 9210e71fb3 )
2023-08-24 12:42:34 -07:00
Fabian Keßler
d0bfdc1658
optimize cmake scripts for subproject use
...
(cherry picked from commit 19cacd3ecb )
2023-07-26 12:01:28 -07:00
Antonio Sánchez
75ebef26b6
Adds new CMake Options for controlling build components.
...
(cherry picked from commit cf82186416 )
2023-07-26 11:52:47 -07:00
Charles Schlosser
208e44c979
fix warnings in tensorreduction and memory
2023-07-19 16:48:07 +00:00
Antonio Sánchez
17d57fb168
Fix up PowerPC MMA flags so it builds by default.
...
(cherry picked from commit 591906477b )
2023-07-11 16:27:32 -07:00
Antonio Sánchez
6973687c70
Fix up PowerPC MMA flags so it builds by default.
...
(cherry picked from commit 65eeedf964 )
2023-07-11 16:20:57 -07:00
Antonio Sanchez
ac561cd038
Reduce tensor_contract_gpu test.
...
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed. Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.
(cherry picked from commit be9e7d205f )
2023-07-11 11:27:31 -07:00
Antonio Sanchez
554982beef
Disable Tree reduction for GPU.
...
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.
(cherry picked from commit 24ebb37f38 )
2023-07-10 16:09:30 -07:00
Antonio Sanchez
89a71f3126
Fix gpu special function tests.
...
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398 .
Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
(cherry picked from commit 701f5d1c91 )
2023-07-10 15:57:08 -07:00
Antonio Sanchez
a605d6b996
Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
...
Also add a missing space for clang.
(cherry picked from commit 846d34384a )
2023-07-10 15:30:41 -07:00
Antonio Sanchez
dfcd6de20a
Clean up CUDA CMake files.
...
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).
The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).
(cherry picked from commit 7b00e8b186 )
2023-07-10 15:30:41 -07:00
Antonio Sanchez
1ec1b16d36
Add buildtests_gpu and check_gpu to simplify GPU testing.
...
This is in preparation of adding GPU tests to the CI, allowing
us to limit building/testing of GPU-specific tests for a given
GPU-capable runner.
GPU tests are tagged with the label "gpu". The new targets
```
make buildtests_gpu
make check_gpu
```
allow building and running only the gpu tests.
(cherry picked from commit 16f9a20a6f )
2023-07-10 15:30:41 -07:00
Antonio Sánchez
0f39c851a5
Fix use of arg function in CUDA.
...
(cherry picked from commit 63dcb429cd )
2023-07-10 15:30:41 -07:00
Kevin Leonardic
daa0b70a65
Fix argument for _mm256_cvtps_ph imm parameter
...
(cherry picked from commit d4b05454a7 )
2023-07-10 15:30:41 -07:00
Antonio Sánchez
33ba98b641
Ensure EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC is always defined on arm.
...
(cherry picked from commit 31cd2ad371 )
2023-07-10 15:30:41 -07:00
Antonio Sánchez
e6e921f0e3
Disable FP16 arithmetic for arm32.
...
(cherry picked from commit 7465b7651e )
2023-07-10 15:30:41 -07:00
Alexander Shaposhnikov
ebfdd6bdea
Do not set EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC for cuda compilation
...
(cherry picked from commit 316eab8deb )
2023-07-10 15:30:41 -07:00
Alejandro Acosta
357bb11066
Replace usage of CudaStreamDevice with GpuStreamDevice in tensor benchmarks GPU
...
(cherry picked from commit 07e4604b19 )
2023-07-10 15:30:40 -07:00
Rasmus Munk Larsen
9b3d104c02
Add missing braces in Umeyama.h
...
(cherry picked from commit 1321821e86 )
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
af3ca50f0b
Work around compiler bug in Umeyama.h.
...
(cherry picked from commit 524c329ab2 )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
26b8fabd80
Return NaN in ndtri for values outside valid input range.
...
(cherry picked from commit 1f79a6078f )
2023-07-10 14:52:08 -07:00
Charles Schlosser
385a0b38f8
JacobiSVD: set m_nonzeroSingularValues to zero if not finite
...
(cherry picked from commit fdc749de2a )
2023-07-10 14:52:08 -07:00
Antonio Sanchez
a4ecfd8ead
Fix boolean bitwise and warning.
...
(cherry picked from commit 70410310a4 )
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
f296720d7d
Make sure we return +/-1 above the clamping point for Erf().
...
(cherry picked from commit b378014fef )
2023-07-10 14:52:08 -07:00
Rob Conde
f04d02dbf6
exclude Eigen/Core and Eigen/src/Core from being ignored due to core ignore rule
...
(cherry picked from commit 990a282fc4 )
2023-07-10 14:52:08 -07:00
Rohit Goswami
6f9bffe8dd
DOC: Update documentation for 3.4.x
...
(cherry picked from commit b0eded878d )
2023-07-10 14:52:08 -07:00
Rasmus Munk Larsen
d4c24eca96
Don't crash on empty tensor contraction.
...
(cherry picked from commit b0f877f8e0 )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
72b0759451
Fix arm builds.
...
(cherry picked from commit 2c8011c2dd )
2023-07-10 14:52:08 -07:00
Jonas Schulze
34d0d83278
Fix some typos
...
(cherry picked from commit 81cb6a51d0 )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
63e8b31c94
Fix parsing of command-line arguments when already specified as a cmake list.
...
(cherry picked from commit 555cec17ed )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
99473f255b
Fix failing MSVC tests due to compiler bugs.
...
(cherry picked from commit 394aabb0a3 )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
2ce5dc428f
Guard use of long double on GPU device.
...
(cherry picked from commit bc5cdc7a67 )
2023-07-10 14:52:08 -07:00
Chip Kerchner
8f1b6198c2
Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC.
...
(cherry picked from commit 54459214a1 )
2023-07-10 14:52:08 -07:00
Antonio Sánchez
dae8c6d7ad
Guard complex sqrt on old MSVC compilers.
...
(cherry picked from commit a16fb889dd )
2023-07-10 14:52:07 -07:00
Antonio Sánchez
2dfdaa2abf
More NEON packetmath fixes.
...
(cherry picked from commit 384269937f )
2023-07-10 14:52:03 -07:00
Antonio Sánchez
a659b5dbb2
Fix NEON make_packet2f.
...
(cherry picked from commit 2dfbf1b251 )
2023-07-10 14:34:09 -07:00
Antonio Sánchez
879854382c
Fix MSVC arm build.
...
(cherry picked from commit 0a5392d606 )
2023-07-10 14:34:09 -07:00
Jeremy Nimmer
90dce8dfa3
Fix undefined behavior in Block access
...
(cherry picked from commit a1cdcdb038 )
2023-07-10 14:34:09 -07:00
Martin Burchell
b26ada1e03
Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm
...
(cherry picked from commit c54785b071 )
2023-07-10 14:34:09 -07:00
Antonio Sánchez
f5593b4baa
Fix reshape strides when input has non-zero inner stride.
...
(cherry picked from commit 2260e11eb0 )
2023-07-10 14:34:09 -07:00
Alexandre Hoffmann
3eb0c8b69e
Changing BiCGSTAB parameters initialization so that it works with custom types
...
(cherry picked from commit 23524ab6fc )
2023-07-10 14:34:09 -07:00
Antonio Sánchez
26adb0e5af
Fix sparseLU solver when destination has a non-unit stride.
...
(cherry picked from commit ab2b26fbc2 )
2023-07-10 14:34:09 -07:00
Antonio Sánchez
5547205092
Correct pnegate for floating-point zero.
...
(cherry picked from commit 8588d8c74b )
2023-07-10 14:34:04 -07:00
Antonio Sánchez
771e91860b
Fix typo in CholmodSupport
...
(cherry picked from commit 7dc6db75d4 )
2023-07-10 12:26:39 -07:00
Antonio Sánchez
4786edba26
Fix pragma check for disabling fastmath.
...
(cherry picked from commit c27d1abe46 )
2023-07-10 10:09:09 -07:00
Antonio Sánchez
15e23ab849
Explicitly state that indices must be sorted.
...
(cherry picked from commit bf48d46338 )
2023-07-10 10:09:09 -07:00
Laurent Rineau
af6e7cc66a
Eigen/Sparse: fix warnings -Wunused-but-set-variable
...
(cherry picked from commit 7846c7387c )
2023-07-10 10:09:09 -07:00
Rasmus Munk Larsen
3fbb1c1b48
Guard GCC-specific pragmas with "#ifdef EIGEN_COMP_GNUC"
...
(cherry picked from commit 5ceed0d57f )
2023-07-10 10:09:09 -07:00
Antonio Sánchez
28cd280726
Fix 4x4 inverse when compiling with -Ofast.
...
(cherry picked from commit 7d6a9925cc )
2023-07-10 10:09:09 -07:00
Antonio Sánchez
8cc3ec8e47
Fix realloc for non-trivial types.
...
(cherry picked from commit 311ba66f7c )
2023-07-10 10:09:02 -07:00
Gilles Aouizerate
d641062a05
fix typo in doc/TutorialSparse.dox
...
(cherry picked from commit 6e83e906c2 )
2023-07-07 15:21:18 -07:00
Michael Palomas
1000cf9fbc
fixed msvc compilation error in GeneralizedEigenSolver.h
...
(cherry picked from commit 525f066671 )
2023-07-07 15:21:18 -07:00
Antonio Sánchez
fd2817e3d6
Add asserts for index-out-of-bounds in IndexedView.
...
(cherry picked from commit f241a2c18a )
2023-07-07 15:21:18 -07:00
Antonio Sánchez
11dacc4802
Fix some cmake issues.
...
(cherry picked from commit f5364331eb )
2023-07-07 15:21:18 -07:00
Antonio Sánchez
ab6f39e1e3
Fix mixingtypes tests.
...
(cherry picked from commit d816044b6e )
2023-07-07 15:21:18 -07:00
Gilles Aouizerate
6576ee4fb1
2 typos fix in the 3rd table.
...
(cherry picked from commit 94cc83faa1 )
2023-07-07 15:21:18 -07:00
Arthur
68f35d76b8
Fix GeneralizedEigenSolver::info() and Asserts
...
(cherry picked from commit a7c1cac18b )
2023-07-07 15:21:18 -07:00
Matthew Sterrett
d0e2b3e58d
Removed unnecessary checks for FP16C
...
(cherry picked from commit 39fcc89798 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
669dc8fadf
Eliminate bool bitwise warnings.
...
(cherry picked from commit b8e93bf589 )
2023-07-07 15:21:17 -07:00
Lexi Bromfield
33a602eb37
Don't double-define Half functions on aarch64
...
(cherry picked from commit 66ea0c09fd )
2023-07-07 15:21:17 -07:00
Rasmus Munk Larsen
a9490cd3c5
Fix code and unit test for a few corner cases in vectorized pow()
...
(cherry picked from commit 7a87ed1b6a )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
61efca2e90
Use numext::sqrt in ConjugateGradient.
...
(cherry picked from commit 7896c7dc6b )
2023-07-07 15:21:17 -07:00
Alexander Richardson
a5469a6f0f
Avoid including <sstream> with EIGEN_NO_IO
...
(cherry picked from commit b7668c0371 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
6aaa45db5f
Include immintrin.h header for enscripten.
...
(cherry picked from commit 34780d8bd1 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
ea57f9b78f
AutoDiff depends on Core, so include appropriate header.
...
(cherry picked from commit e1165dbf9a )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
f55a112cb1
Fix ODR violations.
...
(cherry picked from commit bb51d9f4fa )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
a11bdf3965
Skip f16/bf16 bessel specializations on AVX512 if unavailable.
...
(cherry picked from commit 8ed3b9dcd6 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
b9ac284e52
Use numext::sqrt in Householder.h.
...
(cherry picked from commit 0e083b172e )
2023-07-07 15:21:17 -07:00
sfalmo
3df7d7fec9
Fix row vs column vector typo in Matrix class tutorial
...
(cherry picked from commit 9960a30422 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
80c5b8b3c3
Fix ambiguous comparisons for c++20 (again again)
...
(cherry picked from commit 8c2e0e3cb8 )
2023-07-07 15:21:17 -07:00
Antonio Sanchez
848db4ed2d
Fix BDCSVD condition for failing with numerical issue.
...
(cherry picked from commit 481a4a8c31 )
2023-07-07 15:21:17 -07:00
Rohan Ghige
5cb7505a44
Fix 'Incorrect reference code in STL_interface.hh for ata_product' eigen/isses/2425
...
(cherry picked from commit 798fc1c577 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
af912a7b5c
Fix MSVC+CUDA issues.
...
(cherry picked from commit 5ed7a86ae9 )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
86d958e8f2
Consider inf/nan in scalar test_isApprox.
...
(cherry picked from commit 0c859cf35d )
2023-07-07 15:21:17 -07:00
Erik Schultheis
8e7bd5397f
fixed order of arguments in blas syrk
...
(cherry picked from commit 1ddd3e29cb )
2023-07-07 15:21:17 -07:00
Antonio Sánchez
8a21df2d9c
Disable f16c scalar conversions for MSVC.
...
(cherry picked from commit 73b2c13bf2 )
2023-07-07 15:21:12 -07:00
Antonio Sanchez
ac78f84b72
Eliminate trace unused warning.
...
(cherry picked from commit 9bc9992dd3 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
973b04f3e1
Fix AVX512 builds with MSVC.
...
(cherry picked from commit 9a14d91a99 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
16844d7529
Work around MSVC compiler bug dropping const.
...
(cherry picked from commit 3ca1228d45 )
2023-07-07 15:06:18 -07:00
Tobias Schlüter
5cb2dfec1d
Fix RowMajorBit <-> RowMajor mixup.
...
(cherry picked from commit 40eb34bc5d )
2023-07-07 15:06:18 -07:00
Øystein Sørensen
c473d69d22
Completed a missing parenthesis in tutorial.
...
(cherry picked from commit c062983464 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
6469fbf93a
Work around g++-10 docker issue for geo_orthomethods_4.
...
(cherry picked from commit 9deaa19121 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
3a4a4e9fde
Disable schur non-convergence test.
...
(cherry picked from commit 01b5bc48cc )
2023-07-07 15:06:18 -07:00
Arthur
fab848d4f7
Remove workarounds for bad GCC-4 warnings
...
(cherry picked from commit 514f90c9ff )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b158fcaa74
Fix edge-case in zeta for large inputs.
...
(cherry picked from commit 9296bb4b93 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b6d9b6f48d
Remove duplicate IsRowMajor declaration.
...
(cherry picked from commit 0ae94456a0 )
2023-07-07 15:06:18 -07:00
Antonio Sánchez
b1f06aac61
Update vectorization_logic tests for all platforms.
...
(cherry picked from commit 27d8f29be3 )
2023-07-07 15:06:08 -07:00
Antonio Sanchez
f6954e4485
Fix enum conversion warnings in BooleanRedux.
...
(cherry picked from commit 55c7400db5 )
2023-07-07 11:51:10 -07:00
Antonio Sánchez
b30a2a527e
Remove poor non-convergence checks in NonLinearOptimization.
...
(cherry picked from commit d819a33bf6 )
2023-07-07 11:50:25 -07:00
Antonio Sanchez
bc1b354b32
Adjust tolerance of matrix_power test for MSVC.
...
(cherry picked from commit 1c2690ed24 )
2023-07-07 11:50:02 -07:00
Yury Gitman
bd0d873b16
Fix any/all reduction in the case of row-major layout
...
(cherry picked from commit bf6726a0c6 )
2023-07-07 11:48:49 -07:00
Antonio Sánchez
e0fe006915
Fix mixingtypes for g++-11.
...
(cherry picked from commit 19c39bea29 )
2023-07-07 11:47:23 -07:00
Antonio Sánchez
d259104c8d
Fix frexp packetmath tests for MSVC.
...
(cherry picked from commit 2ed4bee78f )
2023-07-07 11:47:09 -07:00
Antonio Sánchez
cd543434bf
Fix gcc-5 packetmath_12 bug.
...
(cherry picked from commit 8970719771 )
2023-07-07 11:46:15 -07:00
Antonio Sánchez
36be6747e0
Modify test expression to avoid numerical differences ( #2402 ).
...
(cherry picked from commit ae86a146b1 )
2023-07-07 11:45:56 -07:00
Martin Heistermann
d1ed3fe5c9
Fix for crash bug in SPQRSupport: Initialize pointers to nullptr to avoid free() calls of invalid pointers.
...
(cherry picked from commit 550af3938c )
2023-07-07 11:44:55 -07:00
Antonio Sanchez
21e0ad056e
Fix ODR failures in TensorRandom.
...
(cherry picked from commit bded5028a5 )
2023-07-07 11:43:03 -07:00
Antonio Sánchez
709d704819
Fix collision with resolve.h.
...
(cherry picked from commit 94bed2b80c )
2023-07-07 11:40:44 -07:00
Antonio Sánchez
995714142d
Restrict GCC<6.3 maxpd workaround to only gcc.
...
(cherry picked from commit 4bffbe84f9 )
2023-07-07 11:39:27 -07:00
Antonio Sánchez
730a781221
Define EIGEN_HAS_AVX512_MATH in PacketMath.
...
(cherry picked from commit e7f4a901ee )
2023-07-07 11:39:13 -07:00
Antonio Sánchez
77b2807322
Fix AVX512 math function consistency, enable for ICC.
...
(cherry picked from commit 96da541cba )
2023-07-07 11:37:49 -07:00
Antonio Sánchez
52e545324e
Fix ODR violations.
...
(cherry picked from commit cafeadffef )
2023-07-07 11:37:31 -07:00
Stephen Pierce
0cd4719f3e
Silence some MSVC warnings
...
(cherry picked from commit 81c928ba55 )
2023-07-07 11:30:40 -07:00
Erik Schultheis
770ed0794e
fix broken asserts
...
(cherry picked from commit 5a0a165c09 )
2023-07-07 11:25:03 -07:00
Antonio Sánchez
e7248b26a1
Prevent BDCSVD crash caused by index out of bounds.
...
(cherry picked from commit 028ab12586 )
2022-05-19 22:30:33 +00:00
Antonio Sanchez
a1e1612c28
Fix cwise NaN propagation for scalar input.
...
Was missing a template parameter. Updated tests.
Fixes #2474 .
2022-04-15 22:11:22 -07:00
Antonio Sánchez
f3aaba8705
Revert "Replace call to FixedDimensions() with a singleton instance of"
...
This reverts commit 19e6496ce0
(cherry picked from commit f7b31f864c )
2022-04-10 15:34:11 +00:00
Antonio Sánchez
34e5f34b39
Update warning suppression to latest.
2022-03-21 15:56:03 +00:00
Antonio Sánchez
4612627355
Revert "ensure that eigen::internal::size is not found by ADL, rename to ssize and..."
...
This reverts commit bd72e4a8c4
2022-01-18 16:08:59 +00:00
Antonio Sánchez
3e71c621c9
Revert "fix compilation issue with gcc < 10 and -std=c++2a"
...
This reverts commit b5d218d857
2022-01-18 16:08:37 +00:00
Jörg Buchwald
b5d218d857
fix compilation issue with gcc < 10 and -std=c++2a
...
(cherry picked from commit d1bf056394 )
2022-01-13 01:43:43 +00:00
Erik Schultheis
bd72e4a8c4
ensure that eigen::internal::size is not found by ADL, rename to ssize and...
...
(cherry picked from commit 9210e71fb3 )
2022-01-11 16:43:21 +00:00
David Tellenbach
3af8c262ac
Include immintrin.h if F16C is available and vectorization is disabled
...
If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.
This fixes issue #2395 .
(cherry picked from commit c06c3e52a0 )
2021-12-25 22:53:23 +01:00
Antonio Sanchez
7e3bc4177e
Fix tensor broadcast off-by-one error.
...
Caught by JAX unit tests. Triggered if broadcast is smaller than packet
size.
(cherry picked from commit ffb78e23a1 )
2021-11-16 18:41:25 +00:00
Minh Quan HO
c379a21191
nestbyvalue test: fix uninitialized matrix
...
- Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or
worse NaN on other non-linux systems.
- This commit fixes it by initializing to Random().
(cherry picked from commit 4284c68fbb )
2021-11-05 18:19:27 +00:00
Gengxin Xie
6f57470bcc
Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C
...
if the compiler isn't clang
(cherry picked from commit 5c642950a5 )
2021-11-04 22:54:01 +00:00
Lennart Steffen
df53e28179
Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126
...
(cherry picked from commit 163f11e24a )
2021-11-03 23:35:40 +00:00
Chip Kerchner
fbdaff81bd
Invert rows and depth in non-vectorized portion of packing (PowerPC).
...
(cherry picked from commit 9cf34ee0ae )
2021-11-03 23:34:47 +00:00
Nico
71320af66a
Fix -Wbitwise-instead-of-logical clang warning
...
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.
Here, it is inconsequential, so switch to && and || to suppress the warning.
(cherry picked from commit b17bcddbca )
2021-11-03 23:32:57 +00:00
Maxiwell S. Garcia
962a596d21
test: fix boostmutiprec test to compile with older Boost versions
...
Eigen boostmultiprec test redefines a symbol that is already defined
inside Boot Math [1]. Boost has fixed it recently [2], but this
patch avoids errors if Boost version was less than 1.77.
https://github.com/boostorg/math/blob/boost-1.76.0/include/boost/math/policies/policy.hpp#L18
6830712302 (diff-c7a8e5911c2e6be4138e1a966d762200f147792ac16ad96fdcc724313d11f839)
(cherry picked from commit 99600bd1a6 )
2021-11-03 23:31:48 +00:00
Antonio Sanchez
0ab1f8ec03
Fix broadcasting oob error.
...
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly. Here we fix this, and make other minor
cleanup changes.
Fixes #2351 .
(cherry picked from commit a500da1dc0 )
2021-11-03 23:30:47 +00:00
Alex Druinsky
b0fe14213e
Fix vectorized reductions for Eigen::half
...
Fixes compiler errors in expressions that look like
Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()
The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.
The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
(cherry picked from commit d0e3791b1a )
2021-11-03 23:29:55 +00:00
Andreas Krebbel
23469c3cda
ZVector: Move alignas qualifier to come first
...
We currently have plenty of type definitions with the alignment
qualifier coming after the type. The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];
Turn this into:
EIGEN_ALIGN16 int ai[4];
(cherry picked from commit 8faafc3aaa )
2021-11-03 23:29:10 +00:00
Antonio Sanchez
18824d10ea
Fix ZVector build.
...
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the
packetmath tests to pass.
(cherry picked from commit 40bbe8a4d0 )
2021-11-03 23:28:26 +00:00
Antonio Sanchez
f9b2e92040
Remove bad "take" impl that causes g++-11 crash.
...
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999 ).
Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list. Commenting it out
allows our Eigen tests to pass.
(cherry picked from commit 8f8c2ba2fe )
2021-11-03 23:26:34 +00:00
Xinle Liu
9c193db5c7
Fix BDCSVD's total deflation in branch 3.4, similar to that of master in MR 707.
...
(cherry picked from commit 4d045eba53f9a32d052eb942448ba62def066529)
2021-11-03 17:58:57 +00:00
Antonio Sanchez
6b6ba41269
Fix min/max nan-propagation for scalar "other".
...
Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`.
Fixes #2362 .
(cherry picked from commit 03d4cbb307 )
2021-10-28 17:16:49 +00:00
Rasmus Munk Larsen
96007cae8c
Remove license column in tables for builtin sparse solvers since all are MPL2 now.
...
(cherry picked from commit 68e0d023c0 )
2021-10-26 18:11:02 +00:00
Rasmus Munk Larsen
5d918b82a8
Add nan-propagation options to matrix and array plugins.
2021-10-21 13:48:50 -07:00
Antonio Sanchez
05c9d7ce20
Disable MSVC constant condition warning.
...
We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if
constexpr`.
(cherry picked from commit 5bf35383e0 )
2021-10-11 10:00:29 -07:00
Antonio Sanchez
943ef50a2d
Disable testing of complex compound assignment operators for MSVC.
...
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).
Trying to use one of these on device will currently lead to a
duplicate definition error. This is still probably preferable
to no error though. If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.
The only proper solution would be to define our own custom `Complex`
type.
(cherry picked from commit f0f1d7938b )
2021-10-11 10:00:29 -07:00
Antonio Sanchez
7ea4adb5f0
Disable another device warning
...
(cherry picked from commit e9e90892fe )
2021-10-11 10:00:29 -07:00
Antonio Sanchez
71498b32c9
Disable more NVCC warnings.
...
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function. Although we probably should eventually
address these, they are flooding the logs. Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.
The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case. Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects. Again,
these warnings are flooding the logs, so disabling for now.
(cherry picked from commit 86c0decc48 )
2021-10-11 10:00:29 -07:00
Antonio Sanchez
ebd5c6d44b
Add -mfma for AVX512DQ tests.
...
(cherry picked from commit 76bb29c0c2 )
2021-10-11 10:00:29 -07:00
Rasmus Munk Larsen
a8eb797a43
Remove -fabi-version=6 flag from AVX512 builds. It was added to fix builds with gcc 4.9, but these don't even work today, and the flag breaks compilation with newer versions of gcc.
...
(cherry picked from commit 1239adfcab )
2021-10-11 10:00:29 -07:00
Alexander Grund
929bc0e191
Fix alias violation in BFloat16
...
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
(cherry picked from commit b5eaa42695 )
2021-09-20 14:25:58 +00:00
Antonio Sanchez
f046e326d9
Fix strict aliasing bug causing product_small failure.
...
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.
Fixes #2327 .
(cherry picked from commit 3c724c44cf )
2021-09-19 18:06:17 +00:00
Ryan Pavlik
3335e0767c
Fix typos in copyright dates
2021-09-15 13:26:50 -05:00
Antonio Sanchez
3395f4e604
Fix tridiagonalization_inplace_selector.
...
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
(cherry picked from commit ebd4b17d2f )
2021-09-08 15:47:39 +00:00
Antonio Sanchez
f03d3e7072
Missing EIGEN_DEVICE_FUNCs to get gpu_basic passing with CUDA 9.
...
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored. Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
(cherry picked from commit 998bab4b04 )
2021-09-02 03:21:43 +00:00
Maxiwell S. Garcia
b8cf1ed753
Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h
...
(cherry picked from commit 09fc0f97b5 )
2021-09-01 17:26:59 +00:00
Rasmus Munk Larsen
9263475740
Add missing dependency on LAPACK test suite binaries to target buildtests, so make check will work correctly when EIGEN_ENABLE_LAPACK_TESTS is ON.
...
(cherry picked from commit 6f429a202d )
2021-09-01 16:41:47 +00:00
Rasmus Munk Larsen
0fdc99c65e
Allow old Fortran code for LAPACK tests to compile despite argument mismatch errors (REAL passed to COMPLEX workspace argument) with GNU Fortran 10.
...
(cherry picked from commit 7e096ddcb0 )
2021-09-01 16:41:28 +00:00
Antonio Sanchez
07cc362238
Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang.
...
Clang doesn't like !621 , needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.
This fixes our gitlab CI.
(cherry picked from commit 3a6296d4f1 )
2021-09-01 16:40:08 +00:00
Antonio Sanchez
4ef67cbfb2
GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix ( #2315 ).
...
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".
Tested `r` instead, and that seems to work, even with latest compilers.
Also fixed some minor macro issues to eliminate warnings on armv7.
Fixes #2315 .
(cherry picked from commit ff07a8a639 )
2021-08-31 21:23:28 +00:00
Antonio Sanchez
c2b6df6e60
Disable cuda Eigen::half vectorization on host.
...
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.
All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.
Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
(cherry picked from commit cc3573ab44 )
2021-08-31 21:23:11 +00:00
Adam Kallai
277d369060
win: include intrin header in Windows on ARM
...
intrin header is needed for _BitScanReverse and
_BitScanReverse64
(cherry picked from commit 1415817d8d )
2021-08-31 21:22:37 +00:00
Antonio Sanchez
7aee90b8d3
Fix fix<N> when variable templates are not supported.
...
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.
Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.
Fixes #2308
(cherry picked from commit 5db9e5c779 )
2021-08-30 16:23:35 +00:00
Rasmus Munk Larsen
3147391d94
Change version to 3.4.0.
2021-08-18 13:41:58 -07:00
Antonio Sanchez
115591b9e3
Workaround VS 2017 arg bug.
...
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs. It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
(cherry picked from commit 2b410ecbef )
2021-08-18 19:04:50 +00:00
Antonio Sanchez
fd100138dd
Remove unaligned assert tests.
...
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor. Newer versions of clang optimize away
this check.
Removing the affected tests.
(cherry picked from commit 0c4ae56e37 )
2021-08-18 18:39:04 +00:00
Jakob Struye
1ec173b54e
Clearer doc for squaredNorm
...
(cherry picked from commit 53a29c7e35 )
2021-08-18 15:12:36 +00:00
Antonio Sanchez
aef926abf6
Renamed shift_left/shift_right to shiftLeft/shiftRight.
...
For naming consistency. Also moved to ArrayCwiseUnaryOps, and added
test.
(cherry picked from commit fc9d352432 )
2021-08-18 14:44:31 +00:00
Antonio Sanchez
f1032255d3
Add missing PPC packet comparisons.
...
This is to fix the packetmath tests on the ppc pipeline.
(cherry picked from commit 2cc6ee0d2e )
2021-08-17 15:33:55 +00:00
Chip-Kerchner
f57dec64ef
Fix unaligned loads in ploadLhs & ploadRhs for P8.
...
(cherry picked from commit 8dcf3e38ba )
2021-08-17 12:48:36 +00:00
Rasmus Munk Larsen
926e1a8226
Update documentation for matrix decompositions and least squares solvers.
...
(cherry picked from commit 7e6f94961c )
2021-08-16 22:11:38 +00:00
andiwand
cd474d4cd0
minor doc fix in Map.h
...
(cherry picked from commit 5c6b3efead )
2021-08-16 14:26:39 +00:00
Chip-Kerchner
0b56b62f30
Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).
...
(cherry picked from commit e07227c411 )
2021-08-13 18:01:15 +00:00
Chip Kerchner
44cc96e1a1
Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+
...
(cherry picked from commit 66499f0f17 )
2021-08-12 21:39:17 +00:00
Rasmus Munk Larsen
576e451b10
Add CompleteOrthogonalDecomposition to the table of linear algeba decompositions.
...
(cherry picked from commit 96e3b4fc95 )
2021-08-12 16:49:40 +00:00
Antonio Sanchez
0d89012708
Update code snippet for tridiagonalize_inplace.
...
(cherry picked from commit fb1718ad14 )
2021-08-12 15:37:32 +00:00
Rasmus Munk Larsen
6d2506040c
* revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
...
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.
Authored by @awoniu.
(cherry picked from commit 8ce341caf2 )
2021-08-11 18:11:26 +00:00
Nikolay Tverdokhleb
cb44a003de
Do not set AnnoyingScalar::dont_throw if not defined EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW.
...
- Because that member is not declared if the macro is defined.
(cherry picked from commit f1b899eef7 )
2021-08-11 16:39:44 +00:00
ChipKerchner
13d7658c5d
Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).
...
(cherry picked from commit 413bc491f1 )
2021-08-10 20:40:54 +00:00
jenswehner
338924602d
added includes for unordered_map
...
(cherry picked from commit e3e74001f7 )
2021-08-10 16:10:03 +00:00
Gauri Deshpande
93bff85a42
remove denormal flushing in fp32tobf16 for avx & avx512
...
(cherry picked from commit e6a5a594a7 )
2021-08-09 22:15:42 +00:00
Rasmus Munk Larsen
4e0357c6dd
Avoid memory allocation in tridiagonalization_inplace_selector::run.
...
(cherry picked from commit a5a7faeb45 )
2021-08-06 21:48:00 +00:00
Daniel N. Miller (APD)
1e9f623f3e
Do not build shared libs if not supported
...
(cherry picked from commit 09d7122468 )
2021-08-06 21:47:37 +00:00
Jens Wehner
4240b480e0
updated documentation for middleCol and middleRow
...
(cherry picked from commit 4d870c49b7 )
2021-08-05 17:53:36 +00:00
Antonio Sanchez
5b83d3c4bc
Make inverse 3x3 faster and avoid gcc bug.
...
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory. I'm *pretty* sure it's
a false positive, but it's hard to trigger. The same warning
does not trigger with clang or later compiler versions.
In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.
```
name old cpu/op new cpu/op delta
BM_Inverse3x3<DynamicMatrix3T<float>> 423ns ± 2% 433ns ± 3% +2.32% (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>> 425ns ± 2% 427ns ± 3% +0.48% (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>> 7.10ns ± 2% 0.80ns ± 1% -88.67% (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>> 7.45ns ± 2% 1.34ns ± 1% -82.01% (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>> 409ns ± 3% 419ns ± 3% +2.40% (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>> 414ns ± 3% 413ns ± 2% ~ (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>> 7.57ns ± 1% 0.80ns ± 1% -89.37% (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>> 9.09ns ± 1% 2.58ns ±41% -71.60% (p=0.000 n=113+116)
```
(cherry picked from commit 5ad8b9bfe2 )
2021-08-04 22:06:52 +00:00
Antonio Sanchez
46ecdcd745
Fix MPReal detection and support.
...
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.
Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`. It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.
Fixes #2282 .
(cherry picked from commit 31f796ebef )
2021-08-03 18:13:12 +00:00
Antonio Sanchez
9a1691a14e
Fix cmake warnings, FindPASTIX/FindPTSCOTCH.
...
We were getting a lot of warnings due to nested `find_package` calls
within `Find***.cmake` files. The recommended approach is to use
[`find_dependency`](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html )
in package configuration files. I made this change for all instances.
Case mismatches between `Find<Package>.cmake` and calling
`find_package(<PACKAGE>`) also lead to warnings. Fixed for
`FindPASTIX.cmake` and `FindSCOTCH.cmake`.
`FindBLASEXT.cmake` was broken due to calling `find_package_handle_standard_args(BLAS ...)`.
The package name must match, otherwise the `find_package(BLASEXT)` falsely thinks
the package wasn't found. I changed to `BLASEXT`, but then also copied that value
to `BLAS_FOUND` for compatibility.
`FindPastix.cmake` had a typo that incorrectly added `PTSCOTCH` when looking for
the `SCOTCH` component.
`FindPTSCOTCH` incorrectly added `***-NOTFOUND` to include/library lists,
corrupting them. This led to cmake errors down-the-line.
Fixes #2288 .
(cherry picked from commit 1cdec38653 )
2021-08-03 17:48:20 +00:00
Antonio Sanchez
bb33880e57
Fix TriSycl CMake files.
...
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752 , which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.
Also, trisycl now requires c++17.
(cherry picked from commit 8cf6cb27ba )
2021-08-03 17:25:17 +00:00
Antonio Sanchez
237c59a2aa
Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset.
...
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.
This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201 ). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.
Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.
For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.
Fixes #2201 .
(cherry picked from commit 3d98a6ef5c )
2021-08-03 16:32:59 +00:00
Antonio Sanchez
3dc42eeaec
Enable equality comparisons on GPU.
...
Since `std::equal_to::operator()` is not a device function, it
fails on GPU. On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).
Replacing this with a portable version enables comparisons on device.
Addresses #2292 - would need to be cherry-picked. The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
(cherry picked from commit 7880f10526 )
2021-08-03 16:15:44 +00:00
hyunggi-sv
7adc1545b4
fix:typo in dox (has->have)
...
(cherry picked from commit 02a0e79c70 )
2021-08-03 00:54:41 +00:00
Antonio Sanchez
c0c7b695cd
Fix assignment operator issue for latest MSVC+NVCC.
...
Details are scattered across #920 , #1000 , #1324 , #2291 .
Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors). This mess tries to cover
all the cases encountered.
Fixes #2291 .
(cherry picked from commit 9816fe59b4 )
2021-08-03 00:52:21 +00:00
Alexander Karatarakis
c334eece44
_DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier
...
(cherry picked from commit f357283d31 )
2021-07-29 18:18:47 +00:00
Jonas Harsch
5ccb72b2e4
Fixed typo in TutorialSparse.dox
...
(cherry picked from commit 5b81764c0f )
2021-07-26 14:33:10 +00:00
arthurfeeney
9c90d5d832
Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector.
...
(cherry picked from commit a77638387d )
2021-07-22 18:01:55 +00:00
Antonio Sanchez
5d37114fc0
Fix explicit default cache size typo.
...
(cherry picked from commit 297f0f563d )
2021-07-20 18:42:25 +00:00
Rohit Santhanam
930696fc53
Enable extract et. al. for HIP GPU.
...
(cherry picked from commit beea14a18f )
2021-07-09 16:14:19 +00:00
Rasmus Munk Larsen
56966fd2e6
Defer to std::fill_n when filling a dense object with a constant value.
...
(cherry picked from commit 0c361c4899 )
2021-07-09 03:59:56 +00:00
Jonas Harsch
5a3c9eddb4
Removed superfluous boolean degenerate in TensorMorphing.h.
...
(cherry picked from commit e9c9a3130b )
2021-07-08 18:34:10 +00:00
Guoqiang QI
69ec4907da
Make a copy of input matrix when try to do the inverse in place, this fixes #2285 .
...
(cherry picked from commit 4bcd42c271 )
2021-07-08 17:07:54 +00:00
Antonio Sanchez
7571704a43
Fix CMake directory issues.
...
Allows absolute and relative paths for
- `INCLUDE_INSTALL_DIR`
- `CMAKEPACKAGE_INSTALL_DIR`
- `PKGCONFIG_INSTALL_DIR`
Type should be `PATH` not `STRING`. Contrary to !211 , these don't
seem to be made absolute if user-defined - according to the doc any
directories should use `PATH` type, which allows a file dialog
to be used via the GUI. It also better handles file separators.
If user provides an absolute path, it will be made relative to
`CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will
work.
Fixes #2155 and #2269 .
(cherry picked from commit f44f05532d )
2021-07-07 17:44:00 +00:00
Antonio Sanchez
84955d109f
Fix Tensor documentation page.
...
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html ).
Remove it.
Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467 )), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now. To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor<double,2></code></h2>
```
which is ugly.
Fixes #2254 .
(cherry picked from commit f5a9873bbb )
2021-07-07 17:18:20 +00:00
Jonas Harsch
601814b575
Don't crash when attempting to shuffle an empty tensor.
...
(cherry picked from commit aab747021b )
2021-07-02 21:08:38 +00:00
Rasmus Munk Larsen
05bab8139a
Fix breakage of conj_helper in conjunction with custom types introduced in !537 .
...
(cherry picked from commit 7b35638ddb )
2021-07-02 20:59:50 +00:00
Chip Kerchner
eebde572d9
Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow
...
(cherry picked from commit 91e99ec1e0 )
2021-07-01 23:32:38 +00:00
Antonio Sanchez
8190739f12
Fix compile issues for gcc 4.8.
...
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
(cherry picked from commit 6035da5283 )
2021-07-01 23:18:10 +00:00
Antonio Sanchez
b6db013435
Fix inverse nullptr/asan errors for LU.
...
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
(cherry picked from commit 154f00e9ea )
2021-07-01 22:57:25 +00:00
Dan Miller
1f6b1c1a1f
Fix duplicate definitions on Mac
...
(cherry picked from commit eb04775903 )
2021-07-01 20:49:05 +00:00
Alexander Karatarakis
517294d6e1
Make DenseStorage<> trivially_copyable
...
(cherry picked from commit 60400334a9 )
2021-07-01 20:48:47 +00:00
大河メタル
94e2250b36
Correct declarations for aarch64-pc-windows-msvc
...
(cherry picked from commit c81da59a25 )
2021-06-30 04:10:04 +00:00
Antonio Sanchez
d82d915047
Modify tensor argmin/argmax to always return first occurence.
...
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable. Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).
This is otherwise causing unpredictable results in some TF tests.
(cherry picked from commit 3a087ccb99 )
2021-06-29 23:28:37 +00:00
Rasmus Munk Larsen
380d0e4916
Get rid of redundant pabs instruction in complex square root.
...
(cherry picked from commit 5aebbe9098 )
2021-06-29 23:27:09 +00:00
Rohit Santhanam
e83af2cc24
Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
...
This commit addresses this.
(cherry picked from commit 2d132d1736 )
2021-06-25 19:56:18 +00:00
Rasmus Munk Larsen
413ff2b531
Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.
...
(cherry picked from commit bffd267d17 )
2021-06-25 17:13:12 +00:00
Rasmus Munk Larsen
a235ddef39
Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.
...
(cherry picked from commit 52a5f98212 )
2021-06-24 23:30:42 +00:00
Rasmus Munk Larsen
4780d8dfb2
Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp
...
(cherry picked from commit c8a2b4d20a )
2021-06-21 19:07:17 +00:00
Rasmus Munk Larsen
fd5d23fdf3
Update ComplexEigenSolver_eigenvectors.cpp
...
(cherry picked from commit ea62c937ed )
2021-06-21 19:06:54 +00:00
Antonio Sanchez
a2040ef796
Rewrite balancer to avoid overflows.
...
The previous balancer overflowed for large row/column norms.
Modified to prevent that.
Fixes #2273 .
(cherry picked from commit e9ab4278b7 )
2021-06-21 18:14:53 +00:00
Antonio Sanchez
c2c0f6f64b
Fix fix<> for gcc-4.9.3.
...
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.
Fixes ##2267
(cherry picked from commit 35a367d557 )
2021-06-21 17:26:07 +00:00
Antonio Sanchez
ee4e099aa2
Remove pset, replace with ploadu.
...
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned. But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.
This is causing segfaults in Google Maps.
(cherry picked from commit 12e8d57108 )
2021-06-17 17:11:08 +00:00
Chip-Kerchner
9fc93ce31a
EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.
...
(cherry picked from commit ef1fd341a8 )
2021-06-16 22:14:17 +00:00
Antonio Sanchez
1374f49f28
Add missing ppc pcmp_lt_or_nan<Packet8bf>
...
(cherry picked from commit 9e94c59570 )
2021-06-15 22:12:22 +00:00
Antonio Sanchez
2d6eaaf687
Fix placement of permanent GPU defines.
...
(cherry picked from commit 954879183b )
2021-06-15 19:18:20 +00:00
Rasmus Munk Larsen
47722a66f2
Fix more enum arithmetic.
...
(cherry picked from commit 13fb5ab92c )
2021-06-15 16:40:35 +00:00
Antonio Sanchez
5e75331b9f
Fix checking of version number for mingw.
...
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.
Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.
Related to #2268 . CMake and build passes for me after this.
(cherry picked from commit ad82d20cf6 )
2021-06-12 00:02:26 +00:00
Antonio Sanchez
b5fc69bdd8
Add ability to permanently enable HIP/CUDA gpu* defines.
...
When using Eigen for gpu, these simplify portability. If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
(cherry picked from commit 514977f31b )
2021-06-11 17:48:37 +00:00
Antonio Sanchez
4b683b65df
Allow custom TENSOR_CONTRACTION_DISPATCH macro.
...
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
(cherry picked from commit 6aec83263d )
2021-06-11 17:19:29 +00:00
Rasmus Munk Larsen
1cb1ffd5b2
Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.
...
(cherry picked from commit fc87e2cbaa )
2021-06-11 02:57:02 +00:00
Rasmus Munk Larsen
4b502a7215
Fix c++20 warnings about using enums in arithmetic expressions.
...
(cherry picked from commit f64b2954c7 )
2021-06-11 02:35:19 +00:00
Nicolas Cornu
85868564df
Fix parsing of version for nvhpc
...
As the first line of the version is empty it crashes,
so delete first line if it is empty
(cherry picked from commit 001a57519a )
2021-06-10 18:50:22 +00:00
Rohit Santhanam
cbb6ae6296
Removed dead code from GPU float16 unit test.
...
(cherry picked from commit c8d40a7bf1 )
2021-06-10 17:16:47 +00:00
Cyril Kaiser
573570b6c9
Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor.
...
(cherry picked from commit 91cd67f057 )
2021-05-26 19:45:25 +00:00
Antonio Sanchez
98cf1e076f
Add missing NEON ptranspose implementations.
...
Unified implementation using only `vzip`.
(cherry picked from commit dba753a986 )
2021-05-25 19:09:50 +00:00
Antonio Sanchez
ee2a8f7139
Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
...
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3. This was changed in commit 11f55b29 to optimize the
evaluator:
> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
though I cannot reproduce the 16 byte result. Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.
https://godbolt.org/z/MsjTc1PGe
This change modifies the code slightly to allow non-class types. The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.
Fixes #2251
(cherry picked from commit ebb300d0b4 )
2021-05-25 18:19:53 +00:00
Jakub Lichman
3835046309
predux_half_dowto4 test extended to all applicable packets
...
(cherry picked from commit 12471fcb5d )
2021-05-21 16:58:16 +00:00
Steve Bronder
4fbd01cd4b
Adds macro for checking if C++14 variable templates are supported
...
(cherry picked from commit 1720057023 )
2021-05-21 16:43:30 +00:00
Niall Murphy
a883a8797c
Use derived object type in conservative_resize_like_impl
...
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
(cherry picked from commit 391094c507 )
2021-05-20 23:43:57 +00:00
Jakub Lichman
0bd9e9bc45
ptranpose test for non-square kernels added
...
(cherry picked from commit 8877f8d9b2 )
2021-05-20 19:27:20 +00:00
Guoqiang QI
77c66e368c
Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 .
...
(cherry picked from commit 3e006bfd31 )
2021-05-13 15:03:47 +00:00
guoqiangqi
2f908f8255
Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .
...
(cherry picked from commit 3d9051ea84 )
2021-05-12 17:02:19 +00:00
Nathan Luehr
82f13830e6
Fix calls to device functions from host code
...
(cherry picked from commit 972cf0c28a )
2021-05-12 17:01:45 +00:00
Nathan Luehr
d1825cbb68
Device implementation of log for std::complex types.
...
(cherry picked from commit 7e6a1c129c )
2021-05-11 22:31:53 +00:00
Nathan Luehr
d9288f078d
Fix ambiguity due to argument dependent lookup.
...
(cherry picked from commit 6753f0f197 )
2021-05-11 22:00:36 +00:00
Rohit Santhanam
85ebd6aff8
Fix for issue where numext::imag and numext::real are used before they are defined.
...
(cherry picked from commit 39ec31c0ad )
2021-05-10 20:14:10 +00:00
Antonio Sanchez
2947c0cc84
Restore ABI compatibility for conj with 3.3, fix conflict with boost.
...
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version. This changes
restores the second parameter. This also restores ABI compatibility.
The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.
We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.
Fixes #2112 .
(cherry picked from commit c0eb5f89a4 )
2021-05-07 18:38:23 +00:00
Antonio Sanchez
25424f4cf1
Clean up gpu device properties.
...
Made a class and singleton to encapsulate initialization and retrieval of
device properties.
Related to !481 , which already changed the API to address a static
linkage issue.
(cherry picked from commit 0eba8a1fe3 )
2021-05-07 18:13:40 +00:00
Antonio Sanchez
42acbd5700
Fix numext::arg return type.
...
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.
Related to !477 , which uncovered the issue.
(cherry picked from commit 90e9a33e1c )
2021-05-07 17:52:07 +00:00
Christoph Hertzberg
9e0dc8f09b
Revert addition of unused paddsub<Packet2cf>. This fixes #2242
...
(cherry picked from commit 722ca0b665 )
2021-05-07 16:23:03 +00:00
Antonio Sanchez
da19f7a910
Simplify TensorRandom and remove time-dependence.
...
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.
Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be. This is leading to
multiple build breakages.
The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).
Fixes #1602 .
(cherry picked from commit e3b7f59659 )
2021-05-05 23:37:48 +00:00
Antonio Sanchez
fc2cc10842
Better CUDA complex division.
...
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
(cherry picked from commit 1c013be2cc )
2021-04-29 17:58:45 +00:00
Antonio Sanchez
a33855f6ee
Add missing pcmp_lt_or_nan for NEON Packet4bf.
...
(cherry picked from commit 172db7bfc3 )
2021-04-27 21:15:08 +00:00
Theo Fletcher
83df5df61b
Added complex matrix unit tests for SelfAdjointEigenSolve
...
(cherry picked from commit 2ced0cc233 )
2021-04-26 19:18:53 +00:00
Jakub Lichman
ac3c5aad31
Tests added and AVX512 bug fixed for pcmp_lt_or_nan
...
(cherry picked from commit d87648a6be )
2021-04-26 18:07:55 +00:00
Jakub Lichman
63abb10000
Tests for pcmp_lt and pcmp_le added
...
(cherry picked from commit 1115f5462e )
2021-04-23 19:52:23 +00:00
Turing Eret
baf601a0e3
Fix for issue with static global variables in TensorDeviceGpu.h
...
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.
This fixes issue #1475 .
(cherry picked from commit 3804ca0d90 )
2021-04-23 19:06:16 +00:00
Antonio Sanchez
587a691516
Check existence of BSD random before use.
...
`TensorRandom` currently relies on BSD `random()`, which is not always
available. The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html )
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
|| /* Glibc since 2.19: */ _DEFAULT_SOURCE
|| /* Glibc <= 2.19: */ _SVID_SOURCE || _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
(cherry picked from commit 045c0609b5 )
2021-04-23 00:35:05 +00:00
Antonio Sanchez
8830d66c02
DenseStorage safely copy/swap.
...
Fixes #2229 .
For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set. Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
(cherry picked from commit d213a0bcea )
2021-04-22 21:05:50 +00:00
Rasmus Munk Larsen
54425a39b2
Make vectorized compute_inverse_size4 compile with AVX.
...
(cherry picked from commit 85a76a16ea )
2021-04-22 17:25:25 +00:00
Jakub Lichman
34d0be9ec1
Compilation of basicbenchmark fixed
...
(cherry picked from commit d72c794ccd )
2021-04-21 12:09:42 +02:00
Jakub Lichman
42a8bdd4d7
HasExp added for AVX512 Packet8d
...
(cherry picked from commit 2b1dfd1ba0 )
2021-04-21 12:09:21 +02:00
Chip-Kerchner
28564957ac
Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).
...
(cherry picked from commit 06c2760bd1 )
2021-04-21 01:05:21 +00:00
Antonio Sanchez
ab7fe215f9
Fix ldexp for AVX512 ( #2215 )
...
Wrong shuffle was used. Need to interleave low/high halves with a
`permute` instruction.
Fixes #2215 .
(cherry picked from commit 1d79c68ba0 )
2021-04-20 20:52:26 +00:00
David Tellenbach
1f4c0311cd
Bump to 3.3.91 (3.4-rc1)
2021-04-18 23:43:12 +02:00
David Tellenbach
3e819d83bf
Before 3.4 branch
2021-04-18 23:36:14 +02:00
Antonio Sanchez
69adf26aa3
Modify googlehash use to account for namespace issues.
...
The namespace declaration for googlehash is a configurable macro that
can be disabled. In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.
Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution. Symbols within
the `::google` namespace are imported into `Eigen::google`.
We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Christoph Hertzberg
9357feedc7
Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>.
2021-04-13 01:36:59 +02:00
Rasmus Munk Larsen
a2c0542010
Fix typo in TensorDimensions.h
2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82
Fix for float16 GPU unit test.
2021-04-12 10:19:06 +00:00
Christoph Hertzberg
1e1c8a735c
Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
...
Fixes #2209
2021-04-12 01:26:15 +00:00
Jens Wehner
f6fc66aa75
fixed doxygen for unsupported iterative solver module
2021-04-11 16:26:14 +00:00
Christoph Hertzberg
d58678069c
Make iterators default constructible and assignable, by making...
2021-04-09 17:03:28 +00:00
Rohit Santhanam
2859db0220
This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
...
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.
The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.
But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.
The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.
Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Antonio Sanchez
fcb5106c6e
Scaled epsilon the wrong way.
...
Should have been 0.5 to widen the bounds, since this is inverse
precision. Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35
Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189 ).
...
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823
Align local arrays to Packet boundary.
2021-04-06 16:22:36 +00:00
Antonio Sanchez
ace7f132ed
Fix clang tidy warnings in AnnoyingScalar.
...
Clang-tidy complains that full specializations in headers can cause ODR
violations. Marked these as `inline` to fix.
It also complains about renaming arguments in specializations. Set the
argument names to match.
2021-04-05 12:49:38 -07:00
Antonio Sanchez
90187a33e1
Fix SelfAdjoingEigenSolver ( #2191 )
...
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.
Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.
Fixes #2191 .
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce
Fix two bugs in commit
2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120
Fix address of temporary object errors in clang11.
...
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
David Tellenbach
e4233b6e3d
Add CI infrastructure for pre-merge smoke tests.
...
This patch adds pre-merge smoke tests for x86 Linux using gcc-10 and clang-10.
Closes #2188 .
2021-04-01 00:08:37 +00:00
David Tellenbach
ae95b74af9
Add CMake infrastructure for smoke testing
...
Necessary CMake changes to implement pre-merge smoke tests
running via CI.
2021-03-31 22:09:00 +00:00
Rasmus Munk Larsen
5bbc9cea93
Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
...
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Guoqiang QI
b5a926a0f6
Add GitLab templates for issues and merge requests
...
This patch adds GitLab templates for bug reports, feature and merge requests.
This closes #2117 .
2021-03-31 16:01:12 +00:00
Antonio Sanchez
78ee3d6261
Fix CUDA constexpr issues for numeric_limits.
...
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`. For portability, we
need to remove `constexpr` from the affected functions.
For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1
Use Index type in loop over coefficients.
...
Previously was `int`. Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f
Eliminate round_impl double-promotion warnings for c++03.
2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c
Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
...
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit
e7b8643d70
```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
static float (max)() {
^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
return HIPRT_MAX_NORMAL_F;
^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
^
```
The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.
Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1
Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3).
2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70
Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
...
This reverts commit 5f0b4a4010 .
2021-03-24 18:14:56 +00:00
Antonio Sanchez
5521c65afb
Eliminate mixingtypes_7 warning.
...
`g_called` is not used in subtest 7, so was generating a
`-Wunneeded-internal-declaration` warnings. Here we silence
it by initializing the static variable.
2021-03-24 11:05:41 -07:00
Christoph Hertzberg
69a4f70956
Revert "Uses _mm512_abs_pd for Packet8d pabs"
...
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
824272cde8
Re-enable CI for Power
2021-03-22 19:28:25 +01:00
David Tellenbach
4811e81966
Remove yet another comma at end of enum
2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca
Uses _mm512_abs_pd for Packet8d pabs
2021-03-18 15:47:52 +00:00
David Tellenbach
0cc9b5eb40
Split test commainitializer into two substests
2021-03-18 13:28:51 +01:00
Antonio Sanchez
c3fbc6cec7
Use singleton pattern for static registered tests.
...
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared. The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests. The singleton pattern
fixes this.
2021-03-18 00:56:31 +00:00
Niek Bouman
ed964ba3f1
Proposed fix for issue #2187
2021-03-18 00:55:36 +00:00
Antonio Sanchez
8dfe1029a5
Augment NumTraits with min/max_exponent() again.
...
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible. Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.
The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.
Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
eb71e5db98
Fix another warning on missing commas
2021-03-17 03:07:04 +01:00
David Tellenbach
df4bc2731c
Revert "Augment NumTraits with min/max_exponent()."
...
This reverts commit 75ce9cd2a7 .
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7
Augment NumTraits with min/max_exponent().
...
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase. Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.
Fixes #2148
2021-03-17 01:00:41 +00:00
David Tellenbach
9fb7062440
Silence warning on comma at end of enumerator list
2021-03-17 01:46:52 +01:00
Theo Fletcher
b8502a9dd6
Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary.
2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen
2e83cbbba9
Add NaN propagation options to minCoeff/maxCoeff visitors.
2021-03-16 17:02:50 +00:00
Jens Wehner
c0a889890f
Fixed output of complex matrices
2021-03-15 21:51:55 +00:00
Antonio Sanchez
f612df2736
Add fmod(half, half).
...
This is to support TensorFlow's `tf.math.floormod` for half.
2021-03-15 13:32:24 -07:00
Antonio Sanchez
14b7ebea11
Fix numext::round pre c++11 for large inputs.
...
This is to resolve an issue for large inputs when +0.5 can
actually lead to +1 if the input doesn't have enough precision
to resolve the addition - leading to an off-by-one error.
See discussion on 9a663973 .
2021-03-15 19:08:04 +00:00
Chip Kerchner
c9d4367fa4
Fix pround and add print
2021-03-15 19:07:43 +00:00
Antonio Sanchez
d24f9f9b55
Fix NVCC+ICC issues.
...
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.
ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h. This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.
ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
static ConstMapType Map(const Scalar *data)
```
I tracked this down to `friend class Eigen::Map`. Putting the `friend`
statements at the bottom of the class seems to resolve this issue.
Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
14487ed14e
Add increment/decrement operators to Eigen::half.
...
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
2021-03-15 10:52:23 -07:00
Antonio Sanchez
b271110788
Bump up rand histogram threshold.
...
The previous one sometimes fails for MSVC which has a poor random
number generator.
Fixes #2182
2021-03-10 22:17:03 -08:00
Antonio Sanchez
d098c4d64c
Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.
...
Doesn't seem to correctly select the register type, and most types
lead to compiler crashes.
2021-03-10 16:05:01 -08:00
Antonio Sanchez
543e34ab9d
Re-implement move assignments.
...
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.
Here we pass down the move assignment to the underlying storage. Static
storage does a one-way copy, dynamic storage does a swap.
Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.
Fixes : #2119
2021-03-10 16:55:20 +00:00
Ben Niu
b8d1857f0d
[MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons.
2021-03-10 10:21:31 +00:00
Antonio Sanchez
853a5c4b84
Fix ambiguous call to CUDA __half constructor.
2021-03-08 21:06:28 -08:00
Antonio Sanchez
94327dbfba
Fix typo: DEVICE -> GPU
2021-03-08 11:21:00 -08:00
Antonio Sanchez
1296abdf82
Fix non-trivial Half constructor for CUDA.
...
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
2021-03-08 07:32:54 -08:00
Antonio Sanchez
6045243141
Revert stack allocation limit change that crept in.
...
This was accidentally introduced when copying changes between repos.
2021-03-05 14:29:37 -08:00
Deven Desai
1a96d49afe
Changing the Eigen::half implementation for HIP
...
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.
In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.
In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.
The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.
Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.
In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
2021-03-05 19:27:13 +00:00
Antonio Sanchez
2468253c9a
Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
...
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.
Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise. This simplifies checks for supported
features.
Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.
Fixes : #2170
2021-03-05 18:33:18 +00:00
Antonio Sanchez
82d61af3a4
Fix rint SSE/NEON again, using optimization barrier.
...
This is a new version of !423 , which failed for MSVC.
Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674 , https://godbolt.org/z/73ezTG ).
Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.
Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part. If we are
over `2^digits` then just return the input. This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied. Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
David Tellenbach
5f0b4a4010
Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
...
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac
Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()
2021-03-04 18:58:08 +00:00
David Tellenbach
5bfc67f9e7
Deactive CI for Power due to problems with GitLab runner
2021-03-04 17:33:40 +01:00
Eugene Zhulenev
a6601070f2
Add log2 operation to TensorBase
2021-03-04 00:13:36 +00:00
Antonio Sánchez
9a663973b4
Revert "Fix rint for SSE/NEON."
...
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9
Fix rint for SSE/NEON.
...
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).
Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part. If we are over `2^digits` then just return
the input. This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied. Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Christoph Hertzberg
199c5f2b47
geo_alignedbox_5 was failing with AVX enabled, due to storing Vector4d in a std::vector without using an aligned allocator.
...
Got rid of using `std::vector` and simplified the code.
Avoid leading `_`
2021-03-01 03:59:21 +01:00
Antonio Sanchez
1e0c7d4f49
Add print for SSE/NEON, use NEON rounding intrinsics if available.
...
In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the
current rounding mode.
For NEON, we use the provided intrinsics for rint/floor/ceil if
available (armv8).
Related to #1969 .
2021-02-27 22:42:07 +00:00
David Tellenbach
976ae0ca6f
Document that using raw function pointers doesn't work with unaryExpr.
2021-02-27 22:58:42 +01:00
Antonio Sanchez
c65c2b31d4
Make half/bfloat16 constructor take inputs by value, fix powerpc test.
...
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.
Changing `half` to take inputs by value fixes this. Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.
Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Christoph Hertzberg
39a590dfb6
Remove unused include
2021-02-27 19:02:33 +01:00
Christoph Hertzberg
8f686ac4ec
clang 10 aggressively warns about precision loss when converting int to float (or long to double)
...
(cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
2660d01fa7
Inherit from no_assignment_operator to avoid implicit copy constructor warnings
...
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c
Fix some enum-enum conversion warnings
...
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
ca528593f4
Fixed/masked more implicit copy constructor warnings
...
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
81b5fe2f0a
ReturnByValue is already non-copyable
...
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23
Fix double-promotion warnings
...
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Jens Wehner
4bfcee47b9
Idrs iterative linear solver
2021-02-27 12:09:33 +00:00
Antonio Sanchez
29ebd84cb7
Fix NEON sqrt for 32-bit, add prsqrt.
...
With !406 , we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.
Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.
Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen
fe19714f80
Merge branch 'rmlarsen1/eigen-nan_prop'
2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen
e67672024d
Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop
2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen
5e7d4c33d6
Add TODO.
2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen
fb5b59641a
Defer default for minCoeff/maxCoeff to templated variant.
2021-02-26 09:07:00 -08:00
Antonio Sanchez
e19829c3b0
Fix floor/ceil for NEON fp16.
...
Forgot to test this. Fixes bug introduced in !416 .
2021-02-25 20:39:56 -08:00
Antonio Sanchez
5529db7524
Fix SSE/NEON pfloor/pceil for saturated values.
...
The original will saturate if the input does not fit into an integer
type. Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.
Also added `pceil` for NEON.
Fixes #1969 .
2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen
51eba8c3e2
Fix indentation.
2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
5297b7162a
Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.
2021-02-25 18:21:21 +00:00
Antonio Sanchez
ecb7b19dfa
Disable new/delete test for HIP
2021-02-25 08:04:05 -08:00
Chip-Kerchner
6eebe97bab
Fix clang compile when no MMA flags are set. Simplify MMA compiler detection.
2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen
f284c8592b
Don't crash when attempting to slice an empty tensor.
2021-02-24 18:12:51 -08:00
Rasmus Munk Larsen
4cb0592af7
Fix indentation.
2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen
6b34568c74
Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop
2021-02-24 17:54:58 -08:00
Rasmus Munk Larsen
0065f9d322
Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.
2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen
841c8986f8
Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions.
2021-02-24 17:49:20 -08:00
Rasmus Munk Larsen
113e61f364
Remove unused function scalar_cmp_with_cast.
2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen
98ca58b02c
Cast anonymous enums to int when used in expressions.
2021-02-24 23:50:06 +00:00
Chip-Kerchner
c31ead8a15
Having forward template function declarations in a P10 file causes bad code in certain situations.
2021-02-24 23:43:30 +00:00
Guoqiang QI
f44197fabd
Some improvements for kissfft from Martin Reinecke(pocketfft author):
...
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
2021-02-24 21:36:47 +00:00
Antonio Sanchez
a31effc3bc
Add invoke_result and eliminate result_of warnings for C++17+.
...
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.
Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.
Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.
For completeness, added nullary op support for c++03.
Fixes #1850 .
2021-02-24 21:36:14 +00:00
Chip-Kerchner
8523d447a1
Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins.
2021-02-24 20:49:15 +00:00
Antonio Sanchez
5908aeeaba
Fix CUDA device new and delete, and add test.
...
HIP does not support new/delete on device, so test is skipped.
2021-02-24 11:31:41 -08:00
Antonio Sanchez
119763cf38
Eliminate CMake FindPackageHandleStandardArgs warnings.
...
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (UMFPACK)
does not match the name of the calling package (Umfpack). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
2021-02-24 09:52:05 +00:00
Antonio Sanchez
6cf0ab5e99
Disable fast psqrt for NEON.
...
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.
Fixes #2094 .
2021-02-23 19:52:55 -08:00
Antonio Sanchez
aba3998278
Fix check if GPU compile phase for std::hash
2021-02-23 19:52:08 -08:00
Antonio Sanchez
db5691ff2b
Fix some CUDA warnings.
...
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.
`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`. Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.
Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.
Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang. Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8
Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
...
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739
Fixed sparse conservativeResize() when both num cols and rows decreased.
...
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4
Fix compilation errors with later versions of GCC and use of MMA.
2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174
Fixes Bug #1925 . Packets should be passed by const reference, even to inline functions.
2021-02-20 18:56:42 +01:00
Antonio Sanchez
5f9cfb2529
Add missing adolc isinf/isnan.
...
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.
Fixed : #2157
2021-02-19 22:26:56 +00:00
Christoph Hertzberg
ce4af0b38f
Missing change regarding #1910
2021-02-19 20:51:35 +01:00
Christoph Hertzberg
a7749c09bc
Bug #1910 : Make SparseCholesky work for RowMajor matrices
2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e
Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
...
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
frgossen
33e0af0130
Return nan at poles of polygamma, digamma, and zeta if limit is not defined
2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen
7f09d3487d
Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp.
2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655
add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC).
2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776
Bump to 3.4.99
2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591
Define internal::make_unsigned for [unsigned]long long on macOS.
...
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization
internal::make_unsigned<long long>,
for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77
Fix uninitialized warning on AVX.
2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972
Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product
2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215
New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double.
2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980
Updated pfrexp implementation.
...
The original implementation fails for 0, denormals, inf, and NaN.
See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb
Document possible inconsistencies when using Matrix<bool, ...>
2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c
missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel)
2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478
Avoid -Wunused warnings in NDEBUG builds.
...
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944
Don't allow all test jobs to fail but only the currently failing ones.
2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56
Use vrsqrts for rsqrt Newton iterations.
...
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d
Adjust bounds for pexp_float/double
...
The original clamping bounds on `_x` actually produce finite values:
```
exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38
exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.
This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e
Fix ldexp implementations.
...
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits. See #2131 for a complete discussion,
and !375 for other possible implementations.
Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.
The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.
Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.
Fixes #2131 .
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538
loop less ptranspose
2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855
Remove vim specific comments to recognoize correct file-type.
...
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad
Replace nullptr by NULL in SparseLU.h to be C++03 compliant.
2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b
add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves
2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657
Fix documentation typos in LDLT.h
2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070
Enable bdcsvd on host.
...
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error. This prevents it from running on
the host as well.
Seems it was disabled 6 years ago (5384e891 ) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host. Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).
Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81
Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one.
2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79
Disable vectorized pow for half/bfloat16.
...
We are potentially seeing some accuracy issues with these. Ideally we
would hand off to `float`, but that's not trivial with the current
setup.
We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3
Fix excessive GEBP register spilling for 32-bit NEON.
...
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.
By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE ).
This is a replacement of !379 . See there for further discussion.
Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.
Fixes #2138 .
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87
Eliminate implicit conversions from float to double.
2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b
Implement bit_* for device.
...
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`). They also never exist in the
global namespace, so the current implementation always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.
To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993
Fix altivec packetmath.
...
Allows the altivec packetmath tests to pass. There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.
Related to !370 , #2128
cc: @ChipKerchner @pdrocaldeira
Tested on `_BIG_ENDIAN` running on QEMU with VSX. Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c
Fix clang compilation for AltiVec from previous check-in
2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f
Add the following functions
...
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
DenseBase::setConstant(Index, NoChange_t, const Scalar&)
to close #663 .
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54
Add the following functions:
...
DenseBase::setZero(NoChange_t, Index)
DenseBase::setZero(Index, NoChange_t)
DenseBase::setOnes(NoChange_t, Index)
DenseBase::setOnes(Index, NoChange_t)
DenseBase::setRandom(NoChange_t, Index)
DenseBase::setRandom(Index, NoChange_t)
This closes #663 .
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03
Allow for negative strides.
...
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.
This fixes #747 .
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29
Replace language_support module with builtin CheckLanguage
...
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].
A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.
[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d
Include <cstdint> in one place, remove custom typedefs
...
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case ),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.
To avoid this, we remove the custom `typedef` definitions for win32. The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.
Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b
Fix sqrt, ldexp and frexp compilation errors.
2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c
fix test of ExtractVolumePatchesOp
2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41
Eliminate implicit conversion warning in test/array_cwise.cpp
2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90
Replace std::isnan with numext::isnan for c++03
2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381
Remove unused variable in SparseLU.h
2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4
Fix pow and other cwise ops for half/bfloat16.
...
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.
While adding tests for half/bfloat16, found other issues related to
implicit conversions.
Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types. These seem to be implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6
Specialize std::complex operators for use on GPU device.
...
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction). For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.
To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.
This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00
David Tellenbach
65e2169c45
Add support for Arm SVE
...
This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*).
During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector.
Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`.
This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements.
This MR is joint work with Miguel Tairum <miguel.tairum@arm.com >.
2021-01-21 21:11:57 +00:00
Antonio Sanchez
b2126fd6b5
Fix pfrexp/pldexp for half.
...
The recent addition of vectorized pow (!330 ) relies on `pfrexp` and
`pldexp`. This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.
Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
25d8498f8b
Fix stable_norm_1 test.
...
Test enters an infinite loop if size is 1x1 when choosing to select
unique indices for adding `inf` and `NaN` to the input. Here we
revert to non-unique indices, and split the `hypotNorm` check into
two cases: one where both `inf` and `NaN` are added, and one where
only `NaN` is added.
2021-01-21 09:44:42 -08:00
David Tellenbach
660c6b857c
Remove std::cerr in iterative solver since we don't have iostream.
...
This fixes #2123
2021-01-21 11:40:05 +01:00
Antonio Sanchez
d5b7981119
Fix signed-unsigned comparison.
...
Hex literals are interpreted as unsigned, leading to a comparison between
signed max supported function `abcd[0]` (which was negative) to the unsigned
literal `0x80000006`. Should not change result since signed is
implicitly converted to unsigned for the comparison, but eliminates the
warning.
2021-01-20 08:34:00 -08:00
Ivan Popivanov
e409795d6b
Proper CPUID
2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen
cdd8fdc32e
Vectorize pow(x, y). This closes https://gitlab.com/libeigen/eigen/-/issues/2085 , which also contains a description of the algorithm.
...
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:
```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```
If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:
```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```
which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641
Improved std::complex sqrt and rsqrt.
...
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.
Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.
The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.
Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 9.21 ns 9.21 ns 73225448
BM_StdSqrt<std::complex<float>> 17.1 ns 17.1 ns 40966545
BM_Sqrt<std::complex<double>> 8.53 ns 8.53 ns 81111062
BM_StdSqrt<std::complex<double>> 21.5 ns 21.5 ns 32757248
BM_Rsqrt<std::complex<float>> 10.3 ns 10.3 ns 68047474
BM_DivSqrt<std::complex<float>> 16.3 ns 16.3 ns 42770127
BM_Rsqrt<std::complex<double>> 11.3 ns 11.3 ns 61322028
BM_DivSqrt<std::complex<double>> 16.5 ns 16.5 ns 42200711
Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 7.46 ns 7.45 ns 90742042
BM_StdSqrt<std::complex<float>> 16.6 ns 16.6 ns 42369878
BM_Sqrt<std::complex<double>> 8.49 ns 8.49 ns 81629030
BM_StdSqrt<std::complex<double>> 21.8 ns 21.7 ns 31809588
BM_Rsqrt<std::complex<float>> 8.39 ns 8.39 ns 82933666
BM_DivSqrt<std::complex<float>> 14.4 ns 14.4 ns 48638676
BM_Rsqrt<std::complex<double>> 9.83 ns 9.82 ns 70068956
BM_DivSqrt<std::complex<double>> 15.7 ns 15.7 ns 44487798
Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>> 24.2 ns 24.1 ns 28616031
BM_StdSqrt<std::complex<float>> 104 ns 103 ns 6826926
BM_Sqrt<std::complex<double>> 31.8 ns 31.8 ns 22157591
BM_StdSqrt<std::complex<double>> 128 ns 128 ns 5437375
BM_Rsqrt<std::complex<float>> 31.9 ns 31.8 ns 22384383
BM_DivSqrt<std::complex<float>> 99.2 ns 98.9 ns 7250438
BM_Rsqrt<std::complex<double>> 46.0 ns 45.8 ns 15338689
BM_DivSqrt<std::complex<double>> 119 ns 119 ns 5898944
```
2021-01-17 08:50:57 -08:00
Maozhou, Ge
21a8a2487c
fix paddings of TensorVolumePatchOp
2021-01-15 11:51:49 +08:00
Guoqiang QI
38ae5353ab
1)provide a better generic paddsub op implementation
...
2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON
3)make paddsub op support the Packet2cf/Packet4f in SSE
2021-01-13 22:54:03 +00:00
Antonio Sanchez
352f1422d3
Remove inf local variable.
...
Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`,
causing a compile error here. We don't need the local anyways since it's
only used in one spot.
2021-01-12 10:33:15 -08:00
Antonio Sanchez
2044084979
Remove TODO from Transform::computeScaleRotation()
...
Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD`
for small matrices (twice as fast for 2x2, 20% faster for 3x3,
1% faster for 10x10). Since the majority of cases will be small,
let's stick with `JacobiSVD`. See !361 .
2021-01-11 11:30:01 -08:00
Antonio Sanchez
3daf92c7a5
Transform::computeScalingRotation flush determinant to +/- 1.
...
In the previous code, in attempting to correct for a negative
determinant, we end up multiplying and dividing by a number that
is often very near, but not exactly +/-1. By flushing to +/-1,
we can replace a division with a multiplication, and results
are more numerically consistent.
2021-01-11 10:13:38 -08:00
Antonio Sanchez
587fd6ab70
Only specialize complex sqrt_impl for CUDA if not MSVC.
...
We already specialize `sqrt_impl` on windows due to MSVC's mishandling
of `inf` (!355 ).
2021-01-11 09:15:45 -08:00
Deven Desai
2a6addb4f9
Fix for breakage in ROCm support - 210108
...
The following commit breaks ROCm support for Eigen
f149e0ebc3
All unit tests fail with the following error
```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:166:
/home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt'
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
^
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here
template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x);
^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
Error generating file
/home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o
test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```
The error message is accurate, and the fix (provided in thsi commit) is trivial.
2021-01-08 18:04:40 +00:00
Antonio Sanchez
f149e0ebc3
Fix MSVC complex sqrt and packetmath test.
...
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).
Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Antonio Sanchez
8d9cfba799
Fix rand test for MSVC.
...
MSVC's uniform random number generator is not quite as uniform as
others, requiring a slightly wider threshold on the histogram test.
After inspecting histograms for several runs, there's no obvious
bias -- just some bins end up having slightly more less elements
(often > 2% but less than 2.5%).
2021-01-07 12:48:40 -08:00
Essex Edwards
e741b43668
Make Transform::computeRotationScaling(0,&S) continuous
2021-01-07 17:45:14 +00:00
David Tellenbach
0bdc0dba20
Add missing #endif directive in Macros.h
2021-01-07 12:32:41 +01:00
shrek1402
cb654b1c45
#define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors.
2021-01-07 10:12:25 +00:00
Antonio Sanchez
52d1dd979a
Fix Ref initialization.
...
Since `eigen_assert` is a macro, the statements can become noops (e.g.
when compiling for GPU), so they may not execute the contained logic -- which
in this case is the entire `Ref` construction. We need to separate the assert
from statements which have consequences.
Fixes #2113
2021-01-06 13:14:20 -08:00
Antonio Sanchez
166fcdecdb
Allow CwiseUnaryView to be used on device.
...
Added `EIGEN_DEVICE_FUNC` to methods.
2021-01-06 09:16:52 -08:00
Antonio Sanchez
bb1de9dbde
Fix Ref Stride checks.
...
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime. This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.
Here we add the missing runtime checks. This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.
In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.
Added more tests to check these cases.
Fixes #2093 .
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15
Eliminate boolean product warnings by factoring out a
...
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56
Add CUDA complex sqrt.
...
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.
Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.
Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
rgreenblatt
fdf2ee62c5
Fix missing EIGEN_DEVICE_FUNC
2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen
05754100fe
* Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
...
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.
The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.
AVX+FMA (float)
name old cpu/op new cpu/op delta
BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 1% ~
BM_eigen_sqrt_float/8 2.07ns ± 0% 2.08ns ± 1% ~
BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~
BM_eigen_sqrt_float/512 95.7ns ± 0% 95.5ns ± 0% ~
BM_eigen_sqrt_float/4k 776ns ± 0% 763ns ± 0% -1.67%
BM_eigen_sqrt_float/32k 6.57µs ± 1% 6.13µs ± 0% -6.69%
BM_eigen_sqrt_float/256k 83.7µs ± 3% 83.3µs ± 2% ~
BM_eigen_sqrt_float/1M 335µs ± 2% 332µs ± 2% ~
SSE+FMA (float)
name old cpu/op new cpu/op delta
BM_eigen_sqrt_float/1 1.08ns ± 0% 1.09ns ± 0% ~
BM_eigen_sqrt_float/8 2.07ns ± 0% 2.06ns ± 0% ~
BM_eigen_sqrt_float/64 12.4ns ± 0% 12.4ns ± 1% ~
BM_eigen_sqrt_float/512 95.7ns ± 0% 96.3ns ± 4% ~
BM_eigen_sqrt_float/4k 774ns ± 0% 763ns ± 0% -1.50%
BM_eigen_sqrt_float/32k 6.58µs ± 2% 6.11µs ± 0% -7.06%
BM_eigen_sqrt_float/256k 82.7µs ± 1% 82.6µs ± 1% ~
BM_eigen_sqrt_float/1M 330µs ± 1% 329µs ± 2% ~
SSE+FMA (double)
BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~
BM_eigen_sqrt_double/8 6.51ns ± 0% 6.08ns ± 0% -6.68%
BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.65%
BM_eigen_sqrt_double/512 417ns ± 0% 374ns ± 1% -10.29%
BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -11.00%
BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.7µs ± 0% -11.07%
BM_eigen_sqrt_double/256k 213µs ± 0% 206µs ± 1% -3.31%
BM_eigen_sqrt_double/1M 862µs ± 0% 870µs ± 2% +0.96%
AVX+FMA (double)
name old cpu/op new cpu/op delta
BM_eigen_sqrt_double/1 1.63ns ± 0% 1.63ns ± 0% ~
BM_eigen_sqrt_double/8 6.51ns ± 0% 6.06ns ± 0% -6.95%
BM_eigen_sqrt_double/64 52.1ns ± 0% 46.5ns ± 1% -10.80%
BM_eigen_sqrt_double/512 417ns ± 0% 373ns ± 1% -10.59%
BM_eigen_sqrt_double/4k 3.33µs ± 0% 2.97µs ± 1% -10.79%
BM_eigen_sqrt_double/32k 26.7µs ± 0% 23.8µs ± 0% -10.94%
BM_eigen_sqrt_double/256k 214µs ± 0% 208µs ± 2% -2.76%
BM_eigen_sqrt_double/1M 866µs ± 3% 923µs ± 7% ~
2020-12-16 18:16:11 +00:00
Turing Eret
3bee9422d6
Merge branch 'lambdaknight/eigen-master'
2020-12-16 09:18:24 -07:00
Turing Eret
19e6496ce0
Replace call to FixedDimensions() with a singleton instance of
...
FixedDimensions.
2020-12-16 07:34:44 -07:00
Rasmus Munk Larsen
6cee8d347e
Add an additional step of Newton-Raphson for psqrt<double> on Arm, which otherwise has an error of ~1000 ulps.
2020-12-15 04:06:41 +00:00
Turing Eret
bc7d1599fb
TensorStorage with FixedDimensions now has zero instance memory overhead.
...
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344
Remove code checking for CMake < 3.5
...
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
David Tellenbach
751f18f2c0
Remove comma at the end of enumeration list to silence C++03 warnings
2020-12-13 18:11:02 +01:00
Antonio Sanchez
5dc2fbabee
Fix implicit cast to double.
...
Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors
in Google due to `-Wall`.
2020-12-12 09:26:20 -08:00
Antonio Sanchez
55967f87d1
Fix NEON pmax<PropagateNumbers,Packet4bf>.
...
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3
Fix typo in AVX512 packet math.
2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2
Remove unused macro in Half.h
2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0
Fix more SSE/AVX packet conversions for peven.
...
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba
Replace M_LOG2E and M_LN2 with custom macros.
...
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included. However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).
To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7
Fix MSVC SSE casts.
...
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5
Fix for broken ROCm/HIP Support
...
The following commit introduced a breakage in ROCm/HIP support for Eigen.
5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)
```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
return half2half2(from);
^~~~~~~~~~
__half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
__half2 __half2half2(__half x)
^
1 error generated when compiling for gfx900.
```
The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb
Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64
2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c
Add Armv8 guard on PropagateNumbers implementation.
2020-12-10 22:01:55 -03:00
Antonio Sanchez
82c0c18a83
Remove private access of std::deque::_M_impl.
...
This no longer works on gcc or clang, so we should just remove the hack.
The default should compile to similar code anyways.
2020-12-10 14:59:34 -08:00
David Tellenbach
00be0a7ff3
Fix vectorization of complex sqrt on NEON
2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431
Remove comma at end of enumerator list in NEON PacketMath
2020-12-10 15:22:55 +01:00
David Tellenbach
2e8f850c78
Fix a typo in SparseMatrix documentation.
...
This fixes issue #2091 .
2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen
125cc9a5df
Implement vectorized complex square root.
...
Closes #1905
Measured speedup for sqrt of `complex<float>` on Skylake:
SSE:
```
name old time/op new time/op delta
BM_eigen_sqrt_ctype/1 49.4ns ± 0% 54.3ns ± 0% +10.01%
BM_eigen_sqrt_ctype/8 332ns ± 0% 50ns ± 1% -84.97%
BM_eigen_sqrt_ctype/64 2.81µs ± 1% 0.38µs ± 0% -86.49%
BM_eigen_sqrt_ctype/512 23.8µs ± 0% 3.0µs ± 0% -87.32%
BM_eigen_sqrt_ctype/4k 202µs ± 0% 24µs ± 2% -88.03%
BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.19ms ± 0% -88.18%
BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 1.5ms ± 1% -88.20%
BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 6.2ms ± 0% -88.18%
```
AVX2:
```
name old cpu/op new cpu/op delta
BM_eigen_sqrt_ctype/1 53.6ns ± 0% 55.6ns ± 0% +3.71%
BM_eigen_sqrt_ctype/8 334ns ± 0% 27ns ± 0% -91.86%
BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.22µs ± 2% -92.28%
BM_eigen_sqrt_ctype/512 23.8µs ± 1% 1.7µs ± 1% -92.81%
BM_eigen_sqrt_ctype/4k 201µs ± 0% 14µs ± 1% -93.24%
BM_eigen_sqrt_ctype/32k 1.62ms ± 0% 0.11ms ± 1% -93.29%
BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.9ms ± 1% -93.31%
BM_eigen_sqrt_ctype/1M 52.0ms ± 0% 3.5ms ± 1% -93.31%
```
AVX512:
```
name old cpu/op new cpu/op delta
BM_eigen_sqrt_ctype/1 53.7ns ± 0% 56.2ns ± 1% +4.75%
BM_eigen_sqrt_ctype/8 334ns ± 0% 18ns ± 2% -94.63%
BM_eigen_sqrt_ctype/64 2.79µs ± 0% 0.12µs ± 1% -95.54%
BM_eigen_sqrt_ctype/512 23.9µs ± 1% 1.0µs ± 1% -95.89%
BM_eigen_sqrt_ctype/4k 202µs ± 0% 8µs ± 1% -96.13%
BM_eigen_sqrt_ctype/32k 1.63ms ± 0% 0.06ms ± 1% -96.15%
BM_eigen_sqrt_ctype/256k 13.0ms ± 0% 0.5ms ± 4% -96.11%
BM_eigen_sqrt_ctype/1M 52.1ms ± 0% 2.0ms ± 1% -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108
Fix host/device calls for __half.
...
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7
- Enabling PropagateNaN and PropagateNumbers for NEON.
...
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
634bd79b0e
Fix unused warning on new dense_assignment_loop impl.
2020-12-07 19:14:21 -08:00
Antonio Sanchez
655c3a4042
Add specialization for compile-time zero-sized dense assignment.
...
In the current `dense_assignment_loop` implementations, if the
destination's inner or outer size is zero at compile time and if the kernel
involves a product, we currently get a compile error (#2080 ). This is
triggered by attempting to multiply a non-existent row by a column (or
vice-versa).
To address this, we add a specialization for zero-sized assignments
(`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static
check to ensure the size is in-fact zero. This now seems to be the only
existing use of `AllAtOnceTraversal`.
Fixes #2080 .
2020-12-07 08:38:43 -08:00
Antonio Sanchez
5ec4907434
Clean up #ifs in GPU PacketPath.
...
Removed redundant checks and redundant code for CUDA/HIP.
Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.
Fixed : #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0
Add log2() to Eigen.
2020-12-04 21:45:09 +00:00
Antonio Sanchez
2dbac2f99f
Fix bad NEON fp16 check
2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe
Special function implementations for half/bfloat16 packets.
...
Current implementations fail to consider half-float packets, only
half-float scalars. Added specializations for packets on AVX, AVX512 and
NEON. Added tests to `special_packetmath`.
The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.
Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
David Tellenbach
305b8bd277
Remove duplicate #if clause
2020-12-04 18:55:46 +01:00
Antonio Sanchez
9ee9ac81de
Fix shfl* macros for CUDA/HIP
...
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.
Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.
Fixes #2083
2020-12-04 17:18:32 +00:00
shrek1402
a9a2f2bebf
The function 'prefetch' did not work correctly on the win64 platform
2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen
f23dc5b971
Revert "Add log2() operator to Eigen"
...
This reverts commit 4d91519a9b .
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b
Add log2() operator to Eigen
2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465
Small cleanup of generic plog implementations:
...
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation
log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)
which must be added in reverse order since |x| < (sqrt(2)-1).
This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
eb4d4ae070
Include chrono in main for c++11.
...
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Rasmus Munk Larsen
71c85df4c1
Clean up the Tensor header and get rid of the EIGEN_SLEEP macro.
2020-12-02 11:04:04 -08:00
Antonio Sanchez
70fbcf82ed
Fix typo in F32MaskToBf16Mask.
2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6
Fix neon cmp* functions for bf16.
...
The current impl corrupts the comparison masks when converting
from float back to bfloat16. The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`). This was
causing `packetmath_15` to fail on arm.
Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c
Implement CUDA __shfl* for Eigen::half
...
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741
Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available.
2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2
Fix #2077 , EIGEN_CONSTEXPR in Half.
...
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`. This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.
Fixes #2077 .
2020-12-01 03:10:21 +00:00
acxz
7b80609d49
add EIGEN_DEVICE_FUNC to methods
2020-12-01 03:08:47 +00:00
Antonio Sanchez
89f90b585d
AVX512 missing ops.
...
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.
Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Florian Maurin
c5985c46f5
Fix typo in doc
2020-11-30 10:53:29 +00:00
Jim Lersch
68f69414f7
Workaround for doxygen class template titles in which the template
...
part of the class signature is lost due to a problem with forward
declarations. The problem is probably caused by doxygen bug #7689 .
It is confirmed to be fixed in doxygen >= 1.8.19.
2020-11-27 19:52:16 -07:00
Jim Lersch
a7170f2aca
Fix doxygen class blocks that were not associated with the correct classes.
2020-11-27 08:48:11 -07:00
David Tellenbach
550e8f8f57
Include CMakeDependentOption to be able to use cmake_dependent_option
2020-11-27 13:21:49 +01:00
Bowie Owens
9842366bba
Make inclusion of doc sub-directory optional by adjusting options.
...
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().
Requested by:
https://gitlab.com/libeigen/eigen/-/issues/1842
Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
filippobrizzi
aa56e1d980
check for include dirs set
2020-11-26 10:22:46 +00:00
Andreas Krebbel
1e74f93d55
Fix some packet-functions in the IBM ZVector packet-math.
2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed
Revert "Fix Half NaN definition and test."
...
This reverts commit c770746d70 .
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70
Fix Half NaN definition and test.
...
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.
Also modified the `bfloat16_float` test to match.
Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958
Fix boolean float conversion and product warnings.
...
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```
Details:
- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).
- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)
- Deprecated above specialized ops for bool.
- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af
Implement missing AVX half ops.
...
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`. Allows `packetmath_13` to pass.
Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42
Fix Half NaN definition and test.
...
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`. Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.
Also modified the `bfloat16_float` test to match.
Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5
Update AVX half packets, disable test.
...
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail. This disables the test.
Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45
Fixes duplicate symbol when building blas
...
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`
Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a
Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
...
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain
Eigen::{half,bfloat16} -> float -> bool
We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d
Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
...
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.
Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.
Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.
We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.
Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b
Re-enable Arm Neon Eigen::half packets of size 8
...
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d
Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
...
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`. Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b
Initialize primitives to fix -Wuninitialized-const-reference.
...
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.
Fixes #2067 .
2020-11-18 20:23:20 +00:00
Antonio Sanchez
3669498f5a
Fix rule-of-3 for the Tensor module.
...
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.
Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
60218829b7
EOF newline added to InverseSize4.
...
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545
Add missing parens around macro argument.
2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109
Replace SSE_SHUFFLE_MASK macro with shuffle_mask.
2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8
Avoid promotion of Arm __fp16 to float in Neon PacketMath
...
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617
Fix missing EIGEN_CONSTEXPR pop_macro in Half.
...
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055
Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath.
2020-11-17 12:27:01 +00:00
Antonio Sanchez
8e9cc5b10a
Eliminate double-promotion warnings.
...
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```
Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
acxz
9175f50d6f
Add EIGEN_DEVICE_FUNC to TranspositionsBase
...
Fixes #2057 .
2020-11-16 15:37:40 +00:00
Martin Vonheim Larsen
280f4f2407
Enable MathJax in Doxygen.in
...
Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
2020-11-16 12:59:13 +00:00
Antonio Sanchez
bb69a8db5d
Explicit casts of S -> std::complex<T>
...
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`. This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`
The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).
To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`. We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e
Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace.
2020-11-13 16:21:53 +01:00
guoqiangqi
8324e5e049
Fix typo in NEON/PacketMath.h
2020-11-13 00:46:41 +00:00
Antonio Sanchez
852513e7a6
Disable testing of OpenGL by default.
...
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display. Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.
The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.
See #2053 for more details.
2020-11-12 16:15:40 -08:00
Rasmus Munk Larsen
bec72345d6
Simplify expression for inner product fallback in Gemv product evaluator.
2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26
Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct.
2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b
Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
...
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).
name old time/op new time/op delta
BM_VecVecStatStat<float>/1 1.64ns ± 0% 1.64ns ± 0% ~
BM_VecVecStatStat<float>/8 2.99ns ± 0% 2.99ns ± 0% ~
BM_VecVecStatStat<float>/64 7.00ns ± 1% 7.04ns ± 0% +0.66%
BM_VecVecStatStat<float>/512 61.6ns ± 0% 61.6ns ± 0% ~
BM_VecVecStatStat<float>/4k 551ns ± 0% 553ns ± 1% +0.26%
BM_VecVecStatStat<float>/32k 4.45µs ± 0% 4.45µs ± 0% ~
BM_VecVecStatStat<float>/256k 77.9µs ± 0% 78.1µs ± 1% ~
BM_VecVecStatStat<float>/1M 312µs ± 0% 312µs ± 1% ~
BM_VecVecDynStat<float>/1 13.3ns ± 1% 4.6ns ± 0% -65.35%
BM_VecVecDynStat<float>/8 14.4ns ± 0% 6.2ns ± 0% -57.00%
BM_VecVecDynStat<float>/64 24.0ns ± 0% 10.2ns ± 3% -57.57%
BM_VecVecDynStat<float>/512 138ns ± 0% 68ns ± 0% -50.52%
BM_VecVecDynStat<float>/4k 1.11µs ± 0% 0.56µs ± 0% -49.72%
BM_VecVecDynStat<float>/32k 8.89µs ± 0% 4.46µs ± 0% -49.89%
BM_VecVecDynStat<float>/256k 78.2µs ± 0% 78.1µs ± 1% ~
BM_VecVecDynStat<float>/1M 313µs ± 0% 312µs ± 1% ~
BM_VecVecDynDyn<float>/1 10.4ns ± 0% 10.5ns ± 0% +0.91%
BM_VecVecDynDyn<float>/8 12.0ns ± 3% 11.9ns ± 0% ~
BM_VecVecDynDyn<float>/64 37.4ns ± 0% 19.6ns ± 1% -47.57%
BM_VecVecDynDyn<float>/512 159ns ± 0% 81ns ± 0% -49.07%
BM_VecVecDynDyn<float>/4k 1.13µs ± 0% 0.58µs ± 1% -49.11%
BM_VecVecDynDyn<float>/32k 8.91µs ± 0% 5.06µs ±12% -43.23%
BM_VecVecDynDyn<float>/256k 78.2µs ± 0% 78.2µs ± 1% ~
BM_VecVecDynDyn<float>/1M 313µs ± 0% 312µs ± 1% ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323
Add support for dynamic dispatch of MMA instructions for POWER 10
2020-11-12 11:31:15 -03:00
acxz
b714dd9701
remove annotation for first declaration of default con/destruction
2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3
[SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects.
2020-11-12 01:50:28 +00:00
Antonio Sanchez
6961468915
Address issues with openglsupport test.
...
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:
- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.
Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL. This is beyond the scope of this patch.
Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`. Otherwise, the test will try to create a modern context.
Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Everton Constantino
348a48682e
Fix erroneous forward declaration of boost nvp.
2020-11-10 13:07:34 -03:00
guoqiangqi
82fe059f35
Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x
2020-11-04 09:21:39 +08:00
Deven Desai
9d11e2c03e
CMakefile update for ROCm 4.0
...
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
Deven Desai
39a038f2e4
Fix for ROCm (and CUDA?) breakage - 201029
...
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error
e265f7ed8e
```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
^~~~~~
Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
Error generating file
/home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o
test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```
The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e
Remove unused functions in Half.h.
...
The following functions have been removed:
Eigen::half fabsh(const Eigen::half&)
Eigen::half exph(const Eigen::half&)
Eigen::half sqrth(const Eigen::half&)
Eigen::half powh(const Eigen::half&, const Eigen::half&)
Eigen::half floorh(const Eigen::half&)
Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b
Replace numext::as_uint with numext::bit_cast<numext::uint32_t>
2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e
Add support for Armv8.2-a __fp16
...
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces
* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half
The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com >.
This closes #1940 .
2020-10-28 20:15:09 +00:00
mehdi-goli
a725a3233c
[SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems
2020-10-28 08:34:49 +00:00
mehdi-goli
b9ff791fed
[Missing SYCL math op]: Addin the missing LDEXP Function for SYCL.
2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a
[Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function( https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990 ). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend.
2020-10-28 08:30:34 +00:00
Christoph Hertzberg
ecb7bc9514
Bug #2036 make sure find_standard_math_library_test_program actually compiles (and is guaranteed to call math functions)
2020-10-24 15:22:21 +02:00
Susi Lehtola
09f595a269
Make sure compiler does not optimize away calls to math functions
2020-10-24 06:16:50 +00:00
guoqiangqi
28aef8e816
Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f>
2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd
remove unnecessary specialize template of pexp for scale float/double
2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76
Fix missing pfirst<Packet16b> for MSVC.
...
It was only defined under one `#ifdef` case. This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd
Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent.
2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d
Fix for ROCm/HIP breakage - 201013
...
The following commit seems to have introduced regressions in ROCm/HIP support.
183a208212
It causes some unit-tests to fail with the following error
```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```
The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.
It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).
Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.
Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.
Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705
Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers.
2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf
Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api
2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff
Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512.
2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
274ef12b61
Remove leftover debug print statement in cxx11_tensor_expr.cpp
2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
208b3626d1
Revert generic implementation of predux, since it break compilation of predux_any with MSVC.
2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24
Add MatrixBase::cwiseArg()
2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
61fc78bbda
Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b.
2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b
Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions.
2020-10-13 21:48:31 +00:00
acxz
807e51528d
undefine EIGEN_CONSTEXPR before redefinition
2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f
Make bitwise_helper a device function to unbreak GPU builds.
2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1
Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512.
2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81
Disable test exceptions when using OpenMP.
2020-10-09 17:49:07 +02:00
David Tellenbach
9022f5aa8a
Mention problems when using potentially throwing scalars and OpenMP
2020-10-09 17:04:25 +02:00
Karl Ljungkvist
d199c17b14
Fix typo in Tutorial_BlockOperations_block_assignment.cpp
2020-10-09 07:51:36 +00:00
David Tellenbach
4091f6b25c
Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD
2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212
Implement generic bitwise logical packet ops that work for all types.
2020-10-08 22:45:20 +00:00
David Tellenbach
8f8d77b516
Add EIGEN prefix for HAS_LGAMMA_R
2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f
Use lgamma_r if it is available (update check for glibc 2.19+)
2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404
Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
...
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3
Use reinterpret_cast instead of C-style cast in Inverse_NEON.h
2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225
Don't cast away const in Inverse_NEON.h.
2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e
Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h.
2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7
Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments.
2020-10-02 21:05:35 +00:00
janos
f7b185a8b1
dont use =* might not return a Scalar
2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen
9078f47cd6
Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see:
...
https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c
Instead use the equivalent SSE2 intrinsics.
2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen
3b445d9bf2
Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564 .
2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412
Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead.
2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491
Fix alignedbox 32-bit precision test failure.
...
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.
In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.
Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly. Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e
Fix failure in GEBP kernel when compiling with OpenMP and FMA
...
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74
Revert !182 .
2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02
Add missing newline at the end of Inverse_NEON.h
2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3
Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long.
2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473
Fix compilation of pset1frombits calls on iOS.
2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e
Provide a more efficient Packet2l->Packet2d cast method
2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1
Added AlignedBox::transform(AffineTransform).
2020-09-28 18:06:23 +00:00
Alexander Grund
a967fadb21
Make relative path variables of type STRING
...
When the type is PATH an absolute path is expected and user-defined
values are converted into absolute paths relative to the current directory.
Fixes #1990
2020-09-28 16:39:48 +00:00
Zhuyie
e4b24e7fb2
Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined
2020-09-25 09:36:43 +00:00
Deven Desai
ce5c59729d
Fix for ROCm/HIP breakage - 200921
...
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09
I suspect the same breakages occur on the CUDA side too.
The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.
When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.
This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
David Tellenbach
b8a13f13ca
Add CI configuration for ppc64le
2020-09-22 00:26:23 +00:00
Guoqiang QI
821702e771
Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler
2020-09-21 15:49:00 +00:00
David Tellenbach
493a7c773c
Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>>
2020-09-21 12:43:41 +02:00
Павел Мацула
38e4a67394
Fix using FindStandardMathLibrary.cmake with -Wall (-Wunused-value) added to CMAKE_CXX_FLAG
2020-09-19 16:13:16 +00:00
Rasmus Munk Larsen
c4b99f78c7
Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
...
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443
Fix undefined reference to pset1frombits bug on different platforms
2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2
Rename variable to avoid shadowing of a previously declared one
2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09
Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
...
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5
Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
...
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f
Fix typo in PacketMath.h
2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023
Add missing packet op pcmp_lt_or_nan for Packet2d on ARM.
2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3
Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported.
2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b
Add support for CastXML on ARM aarch64
...
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.
Fixes : #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e
Fix compiler error due to c++20 operator== generation rules
2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8
Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler.
2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582
Make bfloat16(float(-nan)) produce -nan, not nan.
2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9
Add plog ops support packet2d for NEON
2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78
Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h
2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b
Fix bfloat16 round on gcc 4.8
2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11
Fix issue #1968 . Don't discard return value from "new" in C++17.
2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313
Unified sse pldexp_double api
2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b
Make blueNorm threadsafe if C++11 atomics are available.
2020-09-12 01:23:29 +00:00
David Tellenbach
adc861cabd
New CI infrastructure, including AArch64 runners
2020-09-11 18:11:49 +00:00
Niels Dekker
5328c9be43
Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
...
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c
Add missing functions for Packet8bf in Altivec architecture.
...
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440
Add Neon psqrt<Packet2d> and pexp<Packet2d>
2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826
remove semi triggering -Wextra-semi-stmt
2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6
Add Inverse_NEON.h
...
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.
```
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------
BM_float -0.1285 -0.1275 568 495 572 499
BM_double -0.2265 -0.2254 638 494 641 496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d
MatrixProuct enhancements:
...
- Changes to Altivec/MatrixProduct
Adapting code to gcc 10.
Generic code style and performance enhancements.
Adding PanelMode support.
Adding stride/offset support.
Enabling float64, std::complex and std::complex.
Fixing lack of symm_pack.
Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275
Changing u/int8_t to un/signed char because clang does not understand
...
it.
Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074
fix #1901 : warning in Mode==(Upper|Lower)
2020-09-02 15:43:58 +02:00
Hans Johnson
5b9bfc892a
BUG: cmake_minimum_required must be the first command
...
https://cmake.org/cmake/help/v3.5/command/project.html
Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
2020-08-28 22:57:16 +00:00
Chip Kerchner
e5886457c8
Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub.
2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6
Fix #1974 : assertion when reserving an empty sparse matrix
2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9
add psqrt ops support packet2f/packet4f for NEON
2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b
adding attributes to constructors to support hip-clang on ROCm 3.5
2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13
Fixing a CUDA / P100 regression introduced by PR 181
...
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.
That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
c060114a25
Fix nightly CI configuration
2020-08-19 20:52:34 +02:00
David Tellenbach
fe8c3ef3cb
Add possibility to split test suit build targets and improved CI configuration
...
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
Rasmus Munk Larsen
d10b27fe37
Add missing inline keyword in Quaternion.h.
2020-08-14 17:51:04 +00:00
David Tellenbach
d4a727d092
Disable min/max NaN propagation in test cxx11_tensor_expr
...
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.
See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
David Tellenbach
d2bb6cf396
Fix compilation error in blasutil test
2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316
Replace the call to int64_t in the blasutil test by explicit types
...
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.
We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a
bfloat16 packetmath for Arm Neon backend
2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df
Add support for Bfloat16 to use vector instructions on Altivec
...
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567
Adding an explicit launch_bounds(1024) attribute for GPU kernels.
...
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.
This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)
Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.
This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Zachary Garrett
21122498ec
Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
...
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
23b7f0572b
Disable CI buildstage again
2020-08-03 15:41:43 +02:00
Gael Guennebaud
d0f5d4bc50
add a banner to advertise the survey
2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d
Fix StlDeque for GCC 10
...
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641
Fix undefine BF16 union behavior in AVX512.
2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c
Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned.
2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d
Fix clang-tidy warnings in generic bfloat16 implementation
...
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae
Fix CMake install command
2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725
Don't allow failure for CI build stage anymore
2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c
Improve CI configuration
...
- Fix docker Fedora image to Fedora:31
- Fix gcc version to gcc-9.2.1
- Use GitLab CI dag
- Fix usage of build cache
- Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad
Add missing footer declaration
2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc
Fix bfloat16 casts
...
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989
remove piwik tracker
2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32
Revert change that made conversion from bfloat16 to {float, double} implicit.
...
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b
Fix cast of blfoat16 to std::complex<T>
...
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854
Make sure we take the little-endian path if __BYTE_ORDER__ is not defined.
2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461
Faster conversion from integer types to bfloat16
...
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.
A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205
Avoid division by zero in nonZerosEstimate() for empty blocks.
2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11
Update tensor reduction test to avoid undefined division of bfloat16 by int.
2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451
Make numext::as_uint a device function.
2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897
user-defined copy operations removed in favor of compiler-generated ones
2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf
Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
...
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning
`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8
AVX path for BF16
2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2
Allow implicit conversion from bfloat16 to float and double
...
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.
Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d
Guard operator<< test by EIGEN_NO_IO.
2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d
Guard operator<< by EIGEN_NO_IO.
2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c
Add operator<< to print a quaternion.
2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48
Fix test basic stuff
...
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6
Add operator==/operator!= to Quaternion. Fixes #1876 .
2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52
Change the sign operator in Eigen to return NaN for NaN arguments, not zero.
2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183
Make test packetmath C++98 compliant
2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac
BF16 for scalar_cmp_with_cast_op
2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97
Delete duplicate test cases in vectorization_logic.cpp
2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c
Fix tensor casts for large packets and casts to/from std::complex
...
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.
We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.
Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f
Fix denormal check pre c++11.
...
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d
Report custom C++ flags in CMake testing summary
2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6
Remote CI tags to enable shared runners
2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d
Pass CMAKE_MAKE_PROGRAM to Fortran language support test
...
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8
Add initial CI configuration file.
...
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5
Fix packetmath_1 float tests for arm/aarch64.
...
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.
Modified `log(denorm)` tests. Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.
Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8
Replaced call to deprecated 'load' function with appropriate call to 'on'.
2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820
Add missing Packet2l/Packet2ul ops for NEON.
...
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.
Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb
Added missing NEON pcasts, update packetmath tests.
...
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.
Removed incorrect `HasHalfPacket` definition for NEON's
`Packet2l`/`Packet2ul`.
Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
packets that are vectorizable)
- added 8:1 cast tests
- random number generation
- original had uninteresting 0 to 0 casts for many casts between
floating-point and integers, and exhibited signed overflow
undefined behavior
Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde
Support BFloat16 in Eigen
2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e
Add Apache 2.0 license text in COPYING.APACHE.
2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d
Update things you can do message using cmake commands
...
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535
Run two independent chains, when reducing tensors.
...
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.
Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.
BM_fullReduction_16T/3 [using 16 threads] 17.3ns ±11% 17.4ns ± 9% ~ (p=0.178 n=18+19)
BM_fullReduction_16T/4 [using 16 threads] 17.6ns ±17% 17.0ns ±18% ~ (p=0.835 n=20+19)
BM_fullReduction_16T/7 [using 16 threads] 18.9ns ±12% 18.2ns ±10% ~ (p=0.756 n=20+18)
BM_fullReduction_16T/8 [using 16 threads] 19.8ns ±13% 19.4ns ±21% ~ (p=0.512 n=20+20)
BM_fullReduction_16T/10 [using 16 threads] 23.5ns ±15% 20.8ns ±24% -11.37% (p=0.000 n=20+19)
BM_fullReduction_16T/15 [using 16 threads] 35.8ns ±21% 26.9ns ±17% -24.76% (p=0.000 n=20+19)
BM_fullReduction_16T/16 [using 16 threads] 38.7ns ±22% 27.7ns ±18% -28.40% (p=0.000 n=20+19)
BM_fullReduction_16T/31 [using 16 threads] 146ns ±17% 74ns ±11% -49.05% (p=0.000 n=20+18)
BM_fullReduction_16T/32 [using 16 threads] 154ns ±19% 84ns ±30% -45.79% (p=0.000 n=20+19)
BM_fullReduction_16T/64 [using 16 threads] 603ns ± 8% 308ns ±12% -48.94% (p=0.000 n=17+17)
BM_fullReduction_16T/128 [using 16 threads] 2.44µs ±13% 1.22µs ± 1% -50.29% (p=0.000 n=17+17)
BM_fullReduction_16T/256 [using 16 threads] 9.84µs ±14% 5.13µs ±30% -47.82% (p=0.000 n=19+19)
BM_fullReduction_16T/512 [using 16 threads] 78.0µs ± 9% 56.1µs ±17% -28.02% (p=0.000 n=18+20)
BM_fullReduction_16T/1k [using 16 threads] 325µs ± 5% 263µs ± 4% -19.00% (p=0.000 n=20+16)
BM_fullReduction_16T/2k [using 16 threads] 1.09ms ± 3% 0.99ms ± 1% -9.04% (p=0.000 n=20+20)
BM_fullReduction_16T/4k [using 16 threads] 7.66ms ± 3% 7.57ms ± 3% -1.24% (p=0.017 n=20+20)
BM_fullReduction_16T/10k [using 16 threads] 65.3ms ± 4% 65.0ms ± 3% ~ (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4
Fix pscatter and pgather for Altivec Complex double
2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961
Fix unused variable warning on Arm
2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234
Fix #1818 : SparseLU: add methods nnzL() and nnzU()
...
Now this compiles without errors:
$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f
Fix #1911 : add benchmark for move semantics with fixed-size matrix
...
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
-o bench_move_semantics
$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8
Remove HasCast and fix packetmath cast tests.
...
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.
Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648
Fix #1757 : remove the word 'suicide'
2020-06-11 00:56:54 +00:00
ShengYang1
b5d66b5e73
Implement scalar_cmp_with_cast_op
2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
c4059ffcb6
Fix static analyzer warning in SelfadjointProduct.h.
...
Fix compiler warnings in GeneralBlockPanelKernel.h.
2020-06-08 11:48:44 -07:00
Thales Sabino
1fcaaf460f
Update FindComputeCpp.cmake to fix build problems on Windows
...
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
David Tellenbach
3ce18d3c8f
Revert ".gitlab-ci.yml: initial commit"
...
This reverts commit 95177362ed to
disable GitLab CI temporarily.
2020-06-05 22:43:49 +02:00
Rasmus Munk Larsen
c2ab36f47a
Fix broken packetmath test for logistic on Arm.
2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen
537e2b322f
Fix typo in previous update to generic predux_any.
2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen
fdc1cbdce3
Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal.
2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen
daf9bbeca2
Fix compilation error in logistic packet op.
2020-06-03 00:57:41 +00:00
n0mend
6d2a9a524b
Update run instructions for benchCholesky
2020-06-01 18:31:46 +00:00
Gael Guennebaud
029a76e115
Bug #1777 : make the scalar and packet path consistent for the logistic function + respective unit test
2020-05-31 00:53:37 +02:00
Gael Guennebaud
99b7f7cb9c
Fix #556 : warnings with mingw
2020-05-31 00:39:44 +02:00
Gael Guennebaud
72782d13e0
Bug #1767 : increase required cmake version to 3.5.0
2020-05-31 00:31:09 +02:00
Gael Guennebaud
867a756509
Fix #1833 : compilation issue of "array!=scalar" with c++20
2020-05-30 23:53:58 +02:00
Gael Guennebaud
ab615e4114
Save one extra temporary when assigning a sparse product to a row-major sparse matrix
2020-05-30 23:15:12 +02:00
Christoph Junghans
95177362ed
.gitlab-ci.yml: initial commit
2020-05-29 09:23:25 -06:00
Kan Chen
8d1302f566
Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON
2020-05-29 00:33:45 +00:00
Antonio Sánchez
8719b9c5bc
Disable test for 32-bit systems (e.g. ARM, i386)
...
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.
https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Yong Tang
8e1df5b082
Fix incorrect usage of if defined(EIGEN_ARCH_PPC) => if EIGEN_ARCH_PPC
...
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.
In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.
This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com >
2020-05-28 05:53:44 -07:00
Kan Chen
4e7046063b
Fix #1874 : it works on both MSVC 2017 and other platforms.
2020-05-21 18:42:56 +08:00
Pedro Caldeira
2d67af2d2b
Add pscatter for Packet16{u}c (int8)
2020-05-20 17:29:34 -03:00
David Tellenbach
5328cd62b3
Guard usage of decltype since it's a C++11 feature
...
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20
Add guard around specialization for bool, which is only currently implemented for SSE.
2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3
- Vectorizing MMA packing.
...
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
a145e4adf5
Add newline at the end of StlIterators.h.
2020-05-15 20:36:00 +00:00
Gael Guennebaud
8ce9630ddb
Fix #1874 : workaround MSVC 2017 compilation issue.
2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen
9b411757ab
Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
...
This change also contains a few minor cleanups:
1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
which can be done in other ways.
2. Remove the "HasInsert" enum, which is no longer needed since we removed the
corresponding packet ops.
3. Add faster pselect op for Packet4i when SSE4.1 is supported.
Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.
Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4 9.77 9.77 71670320
BM_TransposeInPlace<float>/8 21.9 21.9 31929525
BM_TransposeInPlace<float>/16 66.6 66.6 10000000
BM_TransposeInPlace<float>/32 243 243 2879561
BM_TransposeInPlace<float>/59 844 844 829767
BM_TransposeInPlace<float>/64 933 933 750567
BM_TransposeInPlace<float>/128 3944 3945 177405
BM_TransposeInPlace<float>/256 16853 16853 41457
BM_TransposeInPlace<float>/512 204952 204968 3448
BM_TransposeInPlace<float>/1k 1053889 1053861 664
BM_TransposeInPlace<bool>/4 14.4 14.4 48637301
BM_TransposeInPlace<bool>/8 36.0 36.0 19370222
BM_TransposeInPlace<bool>/16 31.5 31.5 22178902
BM_TransposeInPlace<bool>/32 111 111 6272048
BM_TransposeInPlace<bool>/59 626 626 1000000
BM_TransposeInPlace<bool>/64 428 428 1632689
BM_TransposeInPlace<bool>/128 1677 1677 417377
BM_TransposeInPlace<bool>/256 7126 7126 96264
BM_TransposeInPlace<bool>/512 29021 29024 24165
BM_TransposeInPlace<bool>/1k 116321 116330 6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31
Added support for reverse iterators for Vectorwise operations.
2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5
Indexed view should have RowMajorBit when there is staticly a single row
2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28
Resolve "IndexedView of a vector should allow linear access"
2020-05-13 19:24:42 +00:00
Mark Eberlein
ba9d18b938
Add KLU support to spbenchsolver
2020-05-11 21:50:27 +00:00
Pedro Caldeira
5fdc179241
Altivec template functions to better code reusability
2020-05-11 21:04:51 +00:00
mehdi-goli
d3e81db6c5
Eigen moved the scanLauncehr function inside the internal namespace.
...
This commit applies the following changes:
- Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend.
- Replacing `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard.
- minor fixes: commenting out an unused variable to avoid compiler warnings.
2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen
c1d944dd91
Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
...
I cannot measure any performance changes for SSE, AVX, or AVX512.
name old time/op new time/op delta
BM_LinSpace<float>/1 1.63ns ± 0% 1.63ns ± 0% ~ (p=0.762 n=5+5)
BM_LinSpace<float>/8 4.92ns ± 3% 4.89ns ± 3% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/64 34.6ns ± 0% 34.6ns ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/512 217ns ± 0% 217ns ± 0% ~ (p=0.421 n=5+5)
BM_LinSpace<float>/4k 1.68µs ± 0% 1.68µs ± 0% ~ (p=1.000 n=5+5)
BM_LinSpace<float>/32k 13.3µs ± 0% 13.3µs ± 0% ~ (p=0.905 n=5+4)
BM_LinSpace<float>/256k 107µs ± 0% 107µs ± 0% ~ (p=0.841 n=5+5)
BM_LinSpace<float>/1M 427µs ± 0% 427µs ± 0% ~ (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
David Tellenbach
5c4e19fbe7
Possibility to specify user-defined default cache sizes for GEBP kernel
...
Some architectures have no convinient way to determine cache sizes at
runtime. Eigen's GEBP kernel falls back to default cache values in this
case which might not be correct in all situations.
This patch introduces three preprocessor directives
`EIGEN_DEFAULT_L1_CACHE_SIZE`
`EIGEN_DEFAULT_L2_CACHE_SIZE`
`EIGEN_DEFAULT_L3_CACHE_SIZE`
to give users the possibility to set these default values explicitly.
2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen
225ab040e0
Remove unused packet op "palign".
...
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618
Make size odd for transposeInPlace test to make sure we hit the scalar path.
2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
49f1aeb60d
Remove traits declaring NEON vectorized casts that do not actually have packet op implementations.
2020-05-07 09:49:22 -07:00
Rasmus Munk Larsen
2fd8a5a08f
Add parallelization of TensorScanOp for types without packet ops.
...
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.
Benchmark numbers for Tensor<uint32_t>:
name old time/op new time/op delta
BM_cumSumRowReduction_1T/8 [using 1 threads] 76.5ns ± 0% 61.3ns ± 4% -19.80% (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64 [using 1 threads] 2.47µs ± 1% 2.40µs ± 1% -2.77% (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads] 39.8µs ± 0% 39.6µs ± 0% -0.60% (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k [using 1 threads] 13.9ms ± 0% 13.4ms ± 1% -4.19% (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8 [using 2 threads] 76.8ns ± 0% 59.1ns ± 0% -23.09% (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64 [using 2 threads] 2.47µs ± 1% 2.41µs ± 1% -2.53% (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads] 39.8µs ± 0% 34.7µs ± 6% -12.74% (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k [using 2 threads] 13.8ms ± 1% 7.2ms ± 6% -47.74% (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8 [using 8 threads] 76.4ns ± 0% 61.8ns ± 3% -19.02% (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64 [using 8 threads] 2.47µs ± 1% 2.40µs ± 1% -2.84% (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads] 39.8µs ± 0% 28.3µs ±11% -28.75% (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k [using 8 threads] 13.8ms ± 0% 2.7ms ± 5% -80.39% (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8 [using 1 threads] 59.1ns ± 0% 80.3ns ± 0% +35.94% (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64 [using 1 threads] 3.06µs ± 0% 3.08µs ± 1% ~ (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads] 175µs ± 0% 176µs ± 0% ~ (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k [using 1 threads] 824ms ± 1% 844ms ± 1% +2.37% (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8 [using 2 threads] 59.0ns ± 0% 90.7ns ± 0% +53.74% (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64 [using 2 threads] 3.06µs ± 0% 3.10µs ± 0% +1.08% (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads] 176µs ± 0% 189µs ±18% ~ (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k [using 2 threads] 836ms ± 2% 611ms ±14% -26.92% (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8 [using 8 threads] 59.3ns ± 2% 90.6ns ± 0% +52.79% (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64 [using 8 threads] 3.07µs ± 0% 3.10µs ± 0% +0.99% (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads] 176µs ± 0% 80µs ±19% -54.51% (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k [using 8 threads] 827ms ± 2% 180ms ±14% -78.24% (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen
0e59f786e1
Fix accidental copy of loop variable.
2020-05-05 21:35:38 +00:00
Rasmus Munk Larsen
7b76c85daf
Vectorize and parallelize TensorScanOp.
...
TensorScanOp is used in TensorFlow for a number of operations, such as cumulative logexp reduction and cumulative sum and product reductions.
The benchmarks numbers below are for cumulative row- and column reductions of NxN matrices.
name old time/op new time/op delta
BM_cumSumRowReduction_1T/4 [using 1 threads ] 25.1ns ± 1% 35.2ns ± 1% +40.45%
BM_cumSumRowReduction_1T/8 [using 1 threads ] 73.4ns ± 0% 82.7ns ± 3% +12.74%
BM_cumSumRowReduction_1T/32 [using 1 threads ] 988ns ± 0% 832ns ± 0% -15.77%
BM_cumSumRowReduction_1T/64 [using 1 threads ] 4.07µs ± 2% 3.47µs ± 0% -14.70%
BM_cumSumRowReduction_1T/128 [using 1 threads ] 18.0µs ± 0% 16.8µs ± 0% -6.58%
BM_cumSumRowReduction_1T/512 [using 1 threads ] 287µs ± 0% 281µs ± 0% -2.22%
BM_cumSumRowReduction_1T/2k [using 1 threads ] 4.78ms ± 1% 4.78ms ± 2% ~
BM_cumSumRowReduction_1T/10k [using 1 threads ] 117ms ± 1% 117ms ± 1% ~
BM_cumSumRowReduction_8T/4 [using 8 threads ] 25.0ns ± 0% 35.2ns ± 0% +40.82%
BM_cumSumRowReduction_8T/8 [using 8 threads ] 77.2ns ±16% 81.3ns ± 0% ~
BM_cumSumRowReduction_8T/32 [using 8 threads ] 988ns ± 0% 833ns ± 0% -15.67%
BM_cumSumRowReduction_8T/64 [using 8 threads ] 4.08µs ± 2% 3.47µs ± 0% -14.95%
BM_cumSumRowReduction_8T/128 [using 8 threads ] 18.0µs ± 0% 17.3µs ±10% ~
BM_cumSumRowReduction_8T/512 [using 8 threads ] 287µs ± 0% 58µs ± 6% -79.92%
BM_cumSumRowReduction_8T/2k [using 8 threads ] 4.79ms ± 1% 0.64ms ± 1% -86.58%
BM_cumSumRowReduction_8T/10k [using 8 threads ] 117ms ± 1% 18ms ± 6% -84.50%
BM_cumSumColReduction_1T/4 [using 1 threads ] 23.9ns ± 0% 33.4ns ± 1% +39.68%
BM_cumSumColReduction_1T/8 [using 1 threads ] 71.6ns ± 1% 49.1ns ± 3% -31.40%
BM_cumSumColReduction_1T/32 [using 1 threads ] 973ns ± 0% 165ns ± 2% -83.10%
BM_cumSumColReduction_1T/64 [using 1 threads ] 4.06µs ± 1% 0.57µs ± 1% -85.94%
BM_cumSumColReduction_1T/128 [using 1 threads ] 33.4µs ± 1% 4.1µs ± 1% -87.67%
BM_cumSumColReduction_1T/512 [using 1 threads ] 1.72ms ± 4% 0.21ms ± 5% -87.91%
BM_cumSumColReduction_1T/2k [using 1 threads ] 119ms ±53% 11ms ±35% -90.42%
BM_cumSumColReduction_1T/10k [using 1 threads ] 1.59s ±67% 0.35s ±49% -77.96%
BM_cumSumColReduction_8T/4 [using 8 threads ] 23.8ns ± 0% 33.3ns ± 0% +40.06%
BM_cumSumColReduction_8T/8 [using 8 threads ] 71.6ns ± 1% 49.2ns ± 5% -31.33%
BM_cumSumColReduction_8T/32 [using 8 threads ] 1.01µs ±12% 0.17µs ± 3% -82.93%
BM_cumSumColReduction_8T/64 [using 8 threads ] 4.15µs ± 4% 0.58µs ± 1% -86.09%
BM_cumSumColReduction_8T/128 [using 8 threads ] 33.5µs ± 0% 4.1µs ± 4% -87.65%
BM_cumSumColReduction_8T/512 [using 8 threads ] 1.71ms ± 3% 0.06ms ±16% -96.21%
BM_cumSumColReduction_8T/2k [using 8 threads ] 97.1ms ±14% 3.0ms ±23% -96.88%
BM_cumSumColReduction_8T/10k [using 8 threads ] 1.97s ± 8% 0.06s ± 2% -96.74%
2020-05-05 00:19:43 +00:00
Xiaoxiang Cao
a74a278abd
Fix confusing template param name for Stride fwd decl.
2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen
923ee9aba3
Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose.
2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen
a32923a439
Fix (embarrassing) bug in blocked transpose.
2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen
1e41406c36
Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode.
2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen
fbe7916c55
Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile.
2020-04-29 00:58:41 +00:00
Clément Grégoire
82f54ad144
Fix perf monitoring merge function
2020-04-28 17:02:59 +00:00
Rasmus Munk Larsen
ab773c7e91
Extend support for Packet16b:
...
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests
This speeds up matmul for boolean matrices by about 10x
name old time/op new time/op delta
BM_MatMul<bool>/8 267ns ± 0% 479ns ± 0% +79.25% (p=0.008 n=5+5)
BM_MatMul<bool>/32 6.42µs ± 0% 0.87µs ± 0% -86.50% (p=0.008 n=5+5)
BM_MatMul<bool>/64 43.3µs ± 0% 5.9µs ± 0% -86.42% (p=0.008 n=5+5)
BM_MatMul<bool>/128 315µs ± 0% 44µs ± 0% -85.98% (p=0.008 n=5+5)
BM_MatMul<bool>/256 2.41ms ± 0% 0.34ms ± 0% -85.68% (p=0.008 n=5+5)
BM_MatMul<bool>/512 18.8ms ± 0% 2.7ms ± 0% -85.53% (p=0.008 n=5+5)
BM_MatMul<bool>/1k 149ms ± 0% 22ms ± 0% -85.40% (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993
Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
...
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy . Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)
name old time/op new time/op delta
BM_TransposeInPlace<float>/4 9.84ns ± 0% 6.51ns ± 0% -33.80% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8 23.6ns ± 1% 17.6ns ± 0% -25.26% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16 78.8ns ± 0% 60.3ns ± 0% -23.50% (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32 302ns ± 0% 229ns ± 0% -24.40% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59 1.03µs ± 0% 0.84µs ± 1% -17.87% (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64 1.20µs ± 0% 0.89µs ± 1% -25.81% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128 8.96µs ± 0% 3.82µs ± 2% -57.33% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256 152µs ± 3% 17µs ± 2% -89.06% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512 837µs ± 1% 208µs ± 0% -75.15% (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k 4.28ms ± 2% 1.08ms ± 2% -74.72% (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Pedro Caldeira
29f0917a43
Add support to vector instructions to Packet16uc and Packet16c
2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357
Remove unused packet op "preduxp".
2020-04-23 18:17:14 +00:00
René Wagner
0aebe19aca
BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.
...
This enables operator== on Eigen matrices in device code.
2020-04-23 17:25:08 +02:00
Eugene Zhulenev
3c02fefec5
Add async evaluation support to TensorSlicingOp.
...
Device::memcpy is not async-safe and might lead to deadlocks. Always evaluate slice expression in async mode.
2020-04-22 19:55:01 +00:00
Pedro Caldeira
0c67b855d2
Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations
2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen
e8f40e4670
Fix bug in ptrue for Packet16b.
2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen
2f6ddaa25c
Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
...
Benchmark numbers for the logical and of two NxN tensors:
name old time/op new time/op delta
BM_booleanAnd_1T/3 [using 1 threads] 14.6ns ± 0% 14.4ns ± 0% -0.96%
BM_booleanAnd_1T/4 [using 1 threads] 20.5ns ±12% 9.0ns ± 0% -56.07%
BM_booleanAnd_1T/7 [using 1 threads] 41.7ns ± 0% 10.5ns ± 0% -74.87%
BM_booleanAnd_1T/8 [using 1 threads] 52.1ns ± 0% 10.1ns ± 0% -80.59%
BM_booleanAnd_1T/10 [using 1 threads] 76.3ns ± 0% 13.8ns ± 0% -81.87%
BM_booleanAnd_1T/15 [using 1 threads] 167ns ± 0% 16ns ± 0% -90.45%
BM_booleanAnd_1T/16 [using 1 threads] 188ns ± 0% 16ns ± 0% -91.57%
BM_booleanAnd_1T/31 [using 1 threads] 667ns ± 0% 34ns ± 0% -94.83%
BM_booleanAnd_1T/32 [using 1 threads] 710ns ± 0% 35ns ± 0% -95.01%
BM_booleanAnd_1T/64 [using 1 threads] 2.80µs ± 0% 0.11µs ± 0% -95.93%
BM_booleanAnd_1T/128 [using 1 threads] 11.2µs ± 0% 0.4µs ± 0% -96.11%
BM_booleanAnd_1T/256 [using 1 threads] 44.6µs ± 0% 2.5µs ± 0% -94.31%
BM_booleanAnd_1T/512 [using 1 threads] 178µs ± 0% 10µs ± 0% -94.35%
BM_booleanAnd_1T/1k [using 1 threads] 717µs ± 0% 78µs ± 1% -89.07%
BM_booleanAnd_1T/2k [using 1 threads] 2.87ms ± 0% 0.31ms ± 1% -89.08%
BM_booleanAnd_1T/4k [using 1 threads] 11.7ms ± 0% 1.9ms ± 4% -83.55%
BM_booleanAnd_1T/10k [using 1 threads] 70.3ms ± 0% 17.2ms ± 4% -75.48%
2020-04-20 20:16:28 +00:00
dlazenby
00f6340153
Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section)
2020-04-17 21:43:37 +00:00
Rasmus Munk Larsen
5ab87d8aba
Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON.
...
This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i.
Use this machanism to define packets for half and clean up the packet op implementations.
2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen
4aae8ac693
Fix typo in TypeCasting.h
2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen
1d674003b2
Fix big in vectorized casting of
...
{uint8, int8} -> {int16, uint16, int32, uint32, float}
{uint16, int16} -> {int32, uint32, int64, uint64, float}
for NEON. These conversions were advertised as vectorized, but not actually implemented.
2020-04-14 02:11:06 +00:00
Changming Sun
b1aa07a8d3
Fix a bug in TensorIndexList.h
2020-04-13 18:22:03 +00:00
Christoph Hertzberg
d46d726e9d
CommaInitializer wrongfully asserted for 0-sized blocks
...
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
c854e189e6
Fixed commainitializer test.
...
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
2020-04-10 13:53:26 -07:00
jangsoopark
39142904cc
Resolve C4346 when building eigen on windows
2020-04-08 14:55:39 +09:00
Rasmus Munk Larsen
f0577a2bfd
Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
...
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.
Measured improvement with AVX+FMA:
name old time/op new time/op delta
BM_MatMul_ATB/8 139ns ± 1% 129ns ± 1% -7.49% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.46µs ± 1% 1.22µs ± 0% -16.72% (p=0.008 n=5+5)
BM_MatMul_ATB/64 8.43µs ± 1% 7.41µs ± 0% -12.04% (p=0.008 n=5+5)
BM_MatMul_ATB/128 56.8µs ± 1% 52.9µs ± 1% -6.83% (p=0.008 n=5+5)
BM_MatMul_ATB/256 407µs ± 1% 395µs ± 3% -2.94% (p=0.032 n=5+5)
BM_MatMul_ATB/512 3.27ms ± 3% 3.18ms ± 1% ~ (p=0.056 n=5+5)
Measured improvement for AVX512:
name old time/op new time/op delta
BM_MatMul_ATB/8 167ns ± 1% 154ns ± 1% -7.63% (p=0.008 n=5+5)
BM_MatMul_ATB/32 1.08µs ± 1% 0.83µs ± 3% -23.58% (p=0.008 n=5+5)
BM_MatMul_ATB/64 6.21µs ± 1% 5.06µs ± 1% -18.47% (p=0.008 n=5+5)
BM_MatMul_ATB/128 36.1µs ± 2% 31.3µs ± 1% -13.32% (p=0.008 n=5+5)
BM_MatMul_ATB/256 263µs ± 2% 242µs ± 2% -7.92% (p=0.008 n=5+5)
BM_MatMul_ATB/512 1.95ms ± 2% 1.91ms ± 2% ~ (p=0.095 n=5+5)
BM_MatMul_ATB/1k 15.4ms ± 4% 14.8ms ± 2% ~ (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
Antonio Sanchez
8e875719b3
Replace norm() with squaredNorm() to address integer overflows
...
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.
Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Antonio Sanchez
9dda5eb7d2
Missing struct definition in NumTraits
2020-04-07 09:01:11 -07:00
Akshay Naresh Modi
bcc0e9e15c
Add numeric_limits min and max for bool
...
This will allow (among other things) computation of argmax and argmin of bool tensors
2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro
54a0a9c9dd
Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar
...
The error generated by the compiler was:
no matching function for call to 'maxi'
RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);
The important part in the following notes was:
candidate template ignored: deduced conflicting
types for parameter 'T'"
('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen
4fd5d1477b
Fix packetmath test build for AVX.
2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
393dbd8ee9
Fix bug in 52d54278be
2020-03-27 16:42:18 +00:00
Rasmus Munk Larsen
55c8fe8d0f
Fix bug in 52d54278be
2020-03-27 16:41:15 +00:00
Joel Holdsworth
6d2dbfc453
NEON: Fixed MSVC types definitions
2020-03-26 20:19:58 +00:00
Joel Holdsworth
52d54278be
Additional NEON packet-math operations
2020-03-26 20:18:19 +00:00
Everton Constantino
deb93ed1bf
Adhere to recommended load/store intrinsics for pp64le
2020-03-23 15:18:15 -03:00
Aaron Franke
5c22c7a7de
Make file formatting comply with POSIX and Unix standards
...
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Everton Constantino
5afdaa473a
Fixing float32's pround halfway criteria to match STL's criteria.
2020-03-21 22:30:54 -05:00
Alessio M
96cd1ff718
Fixed:
...
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
2020-03-21 05:11:21 +00:00
dlazenby
cc954777f2
Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h
2020-03-20 19:30:01 +00:00
Masaki Murooka
55ecd58a3c
Bug https://gitlab.com/libeigen/eigen/-/issues/1415 : add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base.
2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen
4da2c6b197
Remove reference to non-existent unary_op_base class.
2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35
Add missing arguments to numext::absdiff().
2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b
Add absolute_difference coefficient-wise binary Array function
2020-03-19 17:45:20 +00:00
Everton Constantino
6ff5a14091
Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned
...
types.
2020-03-19 17:31:49 +00:00
Joel Holdsworth
232f904082
Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions
2020-03-19 17:24:06 +00:00
Joel Holdsworth
54aa8fa186
Implement integer square-root for NEON
2020-03-19 17:05:13 +00:00
Allan Leal
37ccb86916
Update NullaryFunctors.h
2020-03-16 11:59:02 +00:00
Deven Desai
7158ed4e0e
Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type
2020-03-12 01:06:24 +00:00
Joel Holdsworth
d53ae40f7b
NEON: Added int64_t and uint64_t packet math
2020-03-10 22:46:19 +00:00
Joel Holdsworth
4b9ecf2924
NEON: Added int8_t and uint8_t packet math
2020-03-10 22:46:19 +00:00
Joel Holdsworth
ceaabd4e16
NEON: Added int16_t and uint16_t packet math
2020-03-10 22:46:19 +00:00
Joel Holdsworth
d5d3cf9339
NEON: Added uint32_t packet math
2020-03-10 22:46:19 +00:00
Joel Holdsworth
eacf97f727
NEON: Implemented half-size vectors
2020-03-10 22:46:19 +00:00
Joel Holdsworth
5f411b729e
NEON: Set packet_traits<double> flags
2020-03-10 22:46:19 +00:00
Joel Holdsworth
88337acae2
test/packetmath: Add tests for all integer types
2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578
test/packetmath: Made negate non-mandatory
2020-03-10 22:46:19 +00:00
Sami Kama
b733b8b680
remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host
2020-03-10 20:28:43 +00:00
Ram-Z
a45d28256d
Don't restrict CMAKE_BUILD_TYPE
...
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
2020-02-28 20:46:53 +00:00
Cédric Hubert
98bfc5aaa8
Update MarketIO.h
2020-02-28 12:41:51 +00:00
Rasmus Munk Larsen
52a2fbbb00
Revert "avoid selecting half-packets when unnecessary"
...
This reverts commit 5ca10480b0
2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen
235bcfe08d
Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"
...
This reverts commit 44df2109c8
2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen
d7a42eade6
Revert "do not pick full-packet if it'd result in more operations"
...
This reverts commit e9cc0cd353
2020-02-25 01:07:15 +00:00
Rasmus Munk Larsen
6ac37768a9
Revert "add some static checks for packet-picking logic"
...
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f
Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
...
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8
Disable test in test/vectorization_logic.cpp, which is currently failing with AVX.
2020-02-24 23:28:25 +00:00
Tobias Bosch
f0ce88cff7
Include <sstream> explicitly, and don't rely on the implicit include via <complex>.
...
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
2020-02-24 23:09:36 +00:00
Ilya Tokar
eb6cc29583
Avoid a division in NonBlockingThreadPool::Steal.
...
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
2020-02-14 16:02:57 -05:00
Francesco Mazzoli
7769600245
add some static checks for packet-picking logic
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
e9cc0cd353
do not pick full-packet if it'd result in more operations
...
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952 >.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
44df2109c8
Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE
...
See comment for details.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
5ca10480b0
avoid selecting half-packets when unnecessary
...
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation >
for an explanation of the problem this solves.
In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.
For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.
This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.
As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:
CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
...
CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
2020-02-07 18:16:16 +01:00
Eugene Zhulenev
f584bd9b30
Fail at compile time if default executor tries to use non-default device
2020-02-06 22:43:24 +00:00
Eugene Zhulenev
3fda850c46
Remove dead code from TensorReduction.h
2020-01-29 18:45:31 +00:00
Jeff Daily
b5df8cabd7
fix hip-clang compilation due to new HIP scalar accessor
2020-01-20 21:08:52 +00:00
Deven Desai
6d284bb1b7
Fix for HIP breakage - 200115. Adding a missing EIGEN_DEVICE_FUNC attr
2020-01-16 00:51:43 +00:00
Srinivas Vasudevan
f6c6de5d63
Ensure Igamma does not NaN or Inf for large values.
2020-01-14 21:32:48 +00:00
Rasmus Munk Larsen
6601abce86
Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core.
2020-01-14 21:03:53 +00:00
Eugene Zhulenev
b9362fb8f7
Convert StridedLinearBufferCopy::Kind to enum class
2020-01-13 11:43:24 -08:00
Everton Constantino
5a8b97b401
Switching unpacket_traits<Packet4i> to vectorizable=true.
2020-01-13 16:08:20 -03:00
Everton Constantino
42838c28b8
Adding correct cache sizes for PPC architecture.
2020-01-13 16:58:14 +00:00
Christoph Hertzberg
1d0c45122a
Removing executable bit from file mode
2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68
Bug #1790 : Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal).
2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f
Added special_packetmath test and tweaked bounds on tests.
...
Refactor shared packetmath code to header file.
(Squashed from PR !38 )
2020-01-11 10:31:21 +00:00
Rasmus Munk Larsen
e1ecfc162d
call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>.
2020-01-10 21:14:08 +00:00
Joel Holdsworth
da5a7afed0
Improvements to the tidiness and completeness of the NEON implementation
2020-01-10 18:31:15 +00:00
Anuj Rawat
452371cead
Fix for gcc build error when using Eigen headers with AVX512
2020-01-10 18:05:42 +00:00
mehdi-goli
601f89dfd0
Adding RInt vector support for SYCL.
2020-01-10 18:00:36 +00:00
Matthew Powelson
2ea5a715cf
Properly initialize b vector in SplineFitting
...
InterpolateWithDerivative does not initialize the be vector correctly. This issue is discussed In stackoverflow question 48382939.
2020-01-09 21:29:04 +00:00
Rasmus Munk Larsen
9254974115
Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda.
2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen
a3ec89b5bd
Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h.
2020-01-09 21:06:34 +00:00
Christoph Hertzberg
8333e03590
Use data.data() instead of &data (since it is not obvious that Array is trivially copyable)
2020-01-09 11:38:19 +01:00
Rasmus Munk Larsen
e6fcee995b
Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower.
2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090
The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()).
2020-01-08 22:21:37 +00:00
Christoph Hertzberg
9623c0c4b9
Fix formatting
2020-01-08 13:58:18 +01:00
Ilya Tokar
19876ced76
Bug #1785 : Introduce numext::rint.
...
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
mehdi-goli
d0ae052da4
[SYCL Backend]
...
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
* Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
* Reducing the SYCL warning by removing the abort function inside the kernel
* Adding Strong inline to functions inside SYCL interop.
2020-01-07 15:13:37 +00:00
Everton Constantino
eedb7eeacf
Protecting integer_types's long long test with a check to see if we have CXX11 support.
2020-01-07 14:35:35 +00:00
Christoph Hertzberg
bcbaad6d87
Bug #1800 : Guard against misleading indentation
2020-01-03 13:47:43 +01:00
Janek Kozicki
00de570793
Fix -Werror -Wfloat-conversion warning.
2019-12-23 23:52:44 +01:00
Deven Desai
636e2bb3fa
Fix for HIP breakage - 191220
...
The breakage was introduced by the following commit :
ae07801dd8
After the commit, HIPCC errors out on some tests with the following error
```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_device_1.dir/cxx11_tensor_device_1_generated_cxx11_tensor_device.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_device.cu:17:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor💯
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:129:12: error: no matching constructor for initialization of 'Eigen::internal::TensorBlockResourceRequirements'
return {merge(lhs.shape_type, rhs.shape_type), // shape_type
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct TensorBlockResourceRequirements {
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 5 arguments, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 3 were provided
...
...
```
The fix is to explicitly decalre the (implicitly called) constructor as a device func
2019-12-20 21:28:00 +00:00
Christoph Hertzberg
1e9664b147
Bug #1796 : Make matrix squareroot usable for Map and Ref types
2019-12-20 18:10:22 +01:00
Christoph Hertzberg
d86544d654
Reduce code duplication and avoid confusing Doxygen
2019-12-19 19:48:39 +01:00
Christoph Hertzberg
dde279f57d
Hide recursive meta templates from Doxygen
2019-12-19 19:47:23 +01:00
Christoph Hertzberg
c21771ac04
Use double-braces initialization (as everywhere else in the test-suite).
2019-12-19 19:20:48 +01:00
Christoph Hertzberg
a3273aeff8
Fix trivial shadow warning
2019-12-19 19:13:11 +01:00
Christoph Hertzberg
870e53c0f2
Bug #1788 : Fix rule-of-three violations inside the stable modules.
...
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587 )
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f
Fix unit-test which I broke in previous fix
2019-12-19 13:42:14 +01:00
Eugene Zhulenev
7a65219a2e
Fix TensorPadding bug in squeezed reads from inner dimension
2019-12-19 05:43:57 +00:00
Eugene Zhulenev
73e55525e5
Return const data pointer from TensorRef evaluator.data()
2019-12-18 23:19:36 +00:00
Eugene Zhulenev
ae07801dd8
Tensor block evaluation cost model
2019-12-18 20:07:00 +00:00
Christoph Hertzberg
72166d0e6e
Fix some maybe-unitialized warnings
2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac
Workaround class-memaccess warnings on newer GCC versions
2019-12-18 16:37:26 +01:00
Jeff Daily
de07c4d1c2
fix compilation due to new HIP scalar accessor
2019-12-17 20:27:30 +00:00
Eugene Zhulenev
788bef6ab5
Reduce block evaluation overhead for small tensor expressions
2019-12-17 19:06:14 +00:00
Rasmus Munk Larsen
7252163335
Add default definition for EIGEN_PREDICT_*
2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen
a566074480
Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
...
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae ), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.
This change also contains a few improvements to speed up the original float specialization of logistic:
- Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
- Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).
The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.
The benchmarks below repeated calls
u = v.logistic() (u = v.tanh(), respectively)
where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].
Benchmark numbers for logistic:
Before:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float 4467 4468 155835 model_time: 4827
AVX
BM_eigen_logistic_float 2347 2347 299135 model_time: 2926
AVX+FMA
BM_eigen_logistic_float 1467 1467 476143 model_time: 2926
AVX512
BM_eigen_logistic_float 805 805 858696 model_time: 1463
After:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float 2589 2590 270264 model_time: 4827
AVX
BM_eigen_logistic_float 1428 1428 489265 model_time: 2926
AVX+FMA
BM_eigen_logistic_float 1059 1059 662255 model_time: 2926
AVX512
BM_eigen_logistic_float 673 673 1000000 model_time: 1463
Benchmark numbers for tanh:
Before:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float 2391 2391 292624 model_time: 4242
AVX
BM_eigen_tanh_float 1256 1256 554662 model_time: 2633
AVX+FMA
BM_eigen_tanh_float 823 823 866267 model_time: 1609
AVX512
BM_eigen_tanh_float 443 443 1578999 model_time: 805
After:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float 2588 2588 273531 model_time: 4242
AVX
BM_eigen_tanh_float 1536 1536 452321 model_time: 2633
AVX+FMA
BM_eigen_tanh_float 1007 1007 694681 model_time: 1609
AVX512
BM_eigen_tanh_float 471 471 1472178 model_time: 805
2019-12-16 21:33:42 +00:00
Christoph Hertzberg
8e5da71466
Resolve double-promotion warnings when compiling with clang.
...
`sin` was calling `sin(double)` instead of `std::sin(float)`
2019-12-13 22:46:40 +01:00
Christoph Hertzberg
9b7a2b43c2
Renamed .hgignore to .gitignore (removing hg-specific "syntax" line)
2019-12-13 19:40:57 +01:00
Ilya Tokar
06e99aaf40
Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
...
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5
Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781.
2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed
Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one.
2019-12-12 01:56:54 +00:00
Eugene Zhulenev
381f8f3139
Initialize non-trivially constructible types when allocating a temp buffer.
2019-12-12 01:31:30 +00:00
Eugene Zhulenev
64272c7f40
Squeeze reads from two inner dimensions in TensorPadding
2019-12-11 16:54:51 -08:00
Eugene Zhulenev
963ba1015b
Add back accidentally deleted default constructor to TensorExecutorTilingContext.
2019-12-11 18:47:55 +00:00
Joel Holdsworth
1b6e0395e6
Added io test
2019-12-11 18:22:57 +00:00
Joel Holdsworth
3c0ef9f394
IO: Fixed printing of char and unsigned char matrices
2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37
Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t
2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0
Bug 1786: fix compilation with MSVC
2019-12-11 16:16:38 +01:00
Eugene Zhulenev
c9220c035f
Remove block memory allocation required by removed block evaluation API
2019-12-10 17:15:55 -08:00
Eugene Zhulenev
1c879eb010
Remove V2 suffix from TensorBlock
2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880
Remove TensorBlock.h and old TensorBlock/BlockMapper
2019-12-10 14:31:44 -08:00
Deven Desai
c49f0d851a
Fix for HIP breakage detected on 191210
...
The following commit introduces compile errors when running eigen with hipcc
2918f85ba9
hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Eugene Zhulenev
2918f85ba9
Do not use std::vector in getResourceRequirements
2019-12-09 16:19:55 -08:00
Artem Belevich
8056a05b54
Undo the block size change.
...
.z *is* used by the EigenContractionKernelInternal().
2019-12-09 11:10:29 -08:00
Eugene Zhulenev
dbb703d44e
Add async evaluation support to TensorSelectOp
2019-12-09 18:36:13 +00:00
Janek Kozicki
11d6465326
fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing.
2019-12-06 21:07:39 +01:00
Eugene Zhulenev
bb7ccac3af
Add recursive work splitting to EvalShardedByInnerDimContext
2019-12-05 14:51:49 -08:00
Artem Belevich
25230d1862
Improve performance of contraction kernels
...
* Force-inline implementations. They pass around pointers to shared memory
blocks. Without inlining compiler must operate via generic pointers.
Inlining allows compiler to detect that we're operating on shared memory
which allows generation of substantially faster code.
* Fixed a long-standing typo which resulted in launching 8x more kernels
than we needed (.z dimension of the block is unused by the kernel).
2019-12-05 12:48:34 -08:00
Gael Guennebaud
08eeb648ea
update hg to git hashes
2019-12-05 16:33:24 +01:00
Rasmus Munk Larsen
366cf005b0
Add missing initialization in cxx11_tensor_trace.cpp.
2019-12-04 23:56:37 +00:00
Gael Guennebaud
c488b8b32f
Replace calls to "hg" by calls to "git"
2019-12-04 11:24:06 +01:00
Gael Guennebaud
8fbe0e4699
Update old links to bitbucket to point to gitlab.com
2019-12-04 10:57:07 +01:00
Gael Guennebaud
114a15c66a
Added tag before-git-migration for changeset a7c7d329d8
2019-12-04 10:06:00 +01:00
Rasmus Larsen
a7c7d329d8
Merged in ezhulenev/eigen-01 (pull request PR-769)
...
Capture TensorMap by value inside tensor expression AST
2019-12-04 00:49:10 +00:00
Rasmus Larsen
cacf433975
Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754)
...
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
Approved-by: Deven Desai <deven.desai.amd@gmail.com >
2019-12-04 00:45:42 +00:00
Eugene Zhulenev
8f4536e852
Capture TensorMap by value inside tensor expression AST
2019-12-03 16:39:05 -08:00
Rasmus Munk Larsen
4e696901f8
Remove __host__ annotation for device-only function.
2019-12-03 14:33:19 -08:00
Rasmus Munk Larsen
ead81559c8
Use EIGEN_DEVICE_FUNC macro instead of __device__.
2019-12-03 12:08:22 -08:00
Gael Guennebaud
6358599ecb
Fix QuaternionBase::cast for quaternion map and wrapper.
2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013
bug #1776 : fix vector-wise STL iterator's operator-> using a proxy as pointer type.
...
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen
66f07efeae
Revert the specialization for scalar_logistic_op<float> introduced in:
...
77b447c24e
While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Rasmus Larsen
3b15373bb3
Merged in ezhulenev/eigen-02 (pull request PR-767)
...
Fix shadow warnings in AlignedBox and SparseBlock
2019-12-02 18:23:11 +00:00
Deven Desai
312c8e77ff
Fix for the HIP build+test errors.
...
Recent changes have introduced the following build error when compiling with HIPCC
---------
unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error: 'ldexp': no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp'
---------
The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.
2019-12-02 17:41:32 +00:00
Rasmus Larsen
956131d0e6
Merged in codeplaysoftware/eigen/SYCL-Backend (pull request PR-691)
...
SYCL Backend
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-11-28 16:19:25 +00:00
Mehdi Goli
00f32752f7
[SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
...
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Eugene Zhulenev
82a47338df
Fix shadow warnings in AlignedBox and SparseBlock
2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen
ea51a9eace
Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build.
2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen
5a3ebda36b
Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp.
2019-11-26 16:18:29 -08:00
Rasmus Larsen
2df57be856
Merged in realjhol/eigen/fix-warnings (pull request PR-760)
...
Fix warnings
2019-11-26 23:24:23 +00:00
Eugene Zhulenev
5496d0da0b
Add async evaluation support to TensorReverse
2019-11-26 15:02:24 -08:00
Eugene Zhulenev
bc66c88255
Add async evaluation support to TensorPadding/TensorImagePatch/TensorShuffling
2019-11-26 11:41:57 -08:00
Gael Guennebaud
c79b6ffe1f
Add an explicit example for auto and re-evaluation
2019-11-20 17:31:23 +01:00
Hans Johnson
e78ed6e7f3
COMP: Simplify install commands for Eigen
...
Confirm that install directory is identical
before and after this simplifying patch.
```bash
hg clone <<Eigen>>
mkdir eigen-bld
cd eigen-bld
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef
make install
find /tmp/pre_eigen_modernize >/tmp/bef
# Apply this patch
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft
make install
find /tmp/post_eigen_modernize |sed 's/post_e/pre_e/g' >/tmp/aft
diff /tmp/bef /tmp/aft
```
2019-11-17 15:14:25 -06:00
Hans Johnson
9d5cdc98c3
COMP: target_compile_definitions requires cmake 2.8.11
...
Features committed in 2016 have required cmake verison 2.8.11.
`sergiu Tue Nov 22 12:25:06 2016 +0100: target_compile_definitions`
Set the minimum cmake version to the minimum version that
is capable of compiling or installing the code base.
2019-11-17 14:59:32 -06:00
Gael Guennebaud
e5778b87b9
Fix duplicate symbol linking error.
2019-11-20 17:23:19 +01:00
Joel Holdsworth
86eb41f1cb
SparseRef: Fixed alignment warning on ARM GCC
2019-11-07 14:34:06 +00:00
Anshul Jaiswal
c1a67cb5af
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.
2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen
cc3d0e6a40
Add EIGEN_HAS_INTRINSIC_INT128 macro
...
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen
ee404667e2
Rollback or PR-746 and partial rollback of 668ab3fc47
...
.
std::array is still not supported in CUDA device code on Windows.
2019-11-05 17:17:58 -08:00
Joel Holdsworth
743c925286
test/packetmath: Silence alignment warnings
2019-11-05 19:06:12 +00:00
Rasmus Larsen
0c9745903a
Merged in ezhulenev/eigen-01 (pull request PR-746)
...
Remove internal::smart_copy and replace with std::copy
2019-11-04 20:18:38 +00:00
Hans Johnson
8c8cab1afd
STYLE: Convert CMake-language commands to lower case
...
Ancient CMake versions required upper-case commands. Later command names
became case-insensitive. Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176
STYLE: Remove CMake-language block-end command arguments
...
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308
1. Fix a bug in psqrt and make it return 0 for +inf arguments.
...
2. Simplify handling of special cases by taking advantage of the fact that the
builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:
Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)
Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.
4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.
Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
2cb2915f90
bug #1744 : fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers
2019-11-15 13:39:51 +01:00
Gael Guennebaud
c3f6fcf2c0
bug #1747 : one more fix for MSVC regarding the Bessel implementation.
2019-11-15 11:12:35 +01:00
Gael Guennebaud
b9837ca9ae
bug #1281 : fix AutoDiffScalar's make_coherent for nested expression of constant ADs.
2019-11-14 14:58:08 +01:00
Gael Guennebaud
0fb6e24408
Fix case issue with Lapack unit tests
2019-11-14 14:16:05 +01:00
Gael Guennebaud
8af045a287
bug #1774 : fix VectorwiseOp::begin()/end() return types regarding constness.
2019-11-14 11:45:52 +01:00
Sakshi Goynar
75b4c0a3e0
PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag
2019-10-31 16:09:16 -07:00
Gael Guennebaud
8496f86f84
Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices.
2019-11-13 21:16:53 +01:00
Gael Guennebaud
002e5b6db6
Move to my.cdash.org
2019-11-13 13:33:49 +01:00
Eugene Zhulenev
13c3327f5c
Remove legacy block evaluation support
2019-11-12 10:12:28 -08:00
Gael Guennebaud
71aa53dd6d
Disable AVX on broken xcode versions. See PR 748.
...
Patch adapted from Hans Johnson's PR 748.
2019-11-12 11:40:38 +01:00
Rasmus Munk Larsen
0ed0338593
Fix a race in async tensor evaluation: Don't run on_done() until after device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs.
2019-11-11 12:26:41 -08:00
Eugene Zhulenev
c952b8dfda
Break loop dependence in TensorGenerator block access
2019-11-11 10:32:57 -08:00
Rasmus Munk Larsen
ebf04fb3e8
Fix data race in css11_tensor_notification test.
2019-11-08 17:44:50 -08:00
Eugene Zhulenev
73ecb2c57d
Cleanup includes in Tensor module after switch to C++11 and above
2019-10-29 15:49:54 -07:00
Eugene Zhulenev
e7ed4bd388
Remove internal::smart_copy and replace with std::copy
2019-10-29 11:25:24 -07:00
Eugene Zhulenev
fbc0a9a3ec
Fix CXX11Meta compilation with MSVC
2019-10-28 18:30:10 -07:00
Eugene Zhulenev
bd864ab42b
Prevent potential ODR in TensorExecutor
2019-10-28 15:45:09 -07:00
Mehdi Goli
6332aff0b2
This PR fixes:
...
* The specialization of array class in the different namespace for GCC<=6.4
* The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
2019-10-23 15:56:56 +01:00
Rasmus Larsen
8e4e29ae99
Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738)
...
Fix for the HIP build+test errors.
2019-10-22 22:18:38 +00:00
Rasmus Munk Larsen
97c0c5d485
Add block evaluation V2 to TensorAsyncExecutor.
...
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Deven Desai
102cf2a72d
Fix for the HIP build+test errors.
...
The errors were introduced by this commit :
After the above mentioned commit, some of the tests started failing with the following error
```
Built target cxx11_tensor_reduction
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible
DestinationBufferKind m_kind;
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible
DestinationBuffer m_destination;
^
```
For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
2019-10-22 19:21:27 +00:00
Rasmus Munk Larsen
668ab3fc47
Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers.
2019-10-18 16:42:00 -07:00
Eugene Zhulenev
df0e8b8137
Propagate block evaluation preference through rvalue tensor expressions
2019-10-17 11:17:33 -07:00
Eugene Zhulenev
0d2a14ce11
Cleanup Tensor block destination and materialized block storage allocation
2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71
TensorBroadcasting support for random/uniform blocks
2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c
Block evaluation for TensorGenerator/TensorReverse/TensorShuffling
2019-10-14 14:31:59 -07:00
Gael Guennebaud
39fb9eeccf
bug #1747 : fix compilation with MSVC
2019-10-14 22:50:23 +02:00
Eugene Zhulenev
a411e9f344
Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op
2019-10-10 10:56:58 -07:00
Rasmus Larsen
b03eb63d7c
Merged in ezhulenev/eigen-01 (pull request PR-726)
...
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-10 16:58:11 +00:00
Gael Guennebaud
e7d8ba747c
bug #1752 : make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled.
2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c
bug #1752 : disable some is_convertible tests for recent compilers.
2019-10-10 11:40:21 +02:00
Eugene Zhulenev
33e1746139
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab
Implement c++03 compatible fix for changeset 7a43af1a33
2019-10-09 16:00:57 +02:00
Gael Guennebaud
196de2efe3
Explicitly bypass resize and memmoves when there is already the exact right number of elements available.
2019-10-08 21:44:33 +02:00
Gael Guennebaud
36da231a41
Disable an expected warning in unit test
2019-10-08 16:28:14 +02:00
Gael Guennebaud
d1def335dc
fix one more possible conflicts with real/imag
2019-10-08 16:19:10 +02:00
Gael Guennebaud
87427d2eaa
PR 719: fix real/imag namespace conflict
2019-10-08 09:15:17 +02:00
Gael Guennebaud
7a43af1a33
Fix compilation of FFTW unit test
2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d
Add block evaluation to TensorEvalTo and fix few small bugs
2019-10-07 15:34:26 -07:00
Brian Zhao
3afb640b56
Fixing incorrect size in Tensor documentation.
2019-10-04 21:30:35 -07:00
Rasmus Munk Larsen
20c4a9118f
Use "pdiv" rather than operator/ to support packet types.
2019-10-04 16:54:03 -07:00
Rasmus Larsen
d1dd51cb5f
Merged in ezhulenev/eigen-01 (pull request PR-723)
...
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-10-04 17:19:13 +00:00
Eugene Zhulenev
98bdd7252e
Fix compilation warnings and errors with clang in TensorBlockV2 code and tests
2019-10-04 10:15:33 -07:00
Rasmus Munk Larsen
fab4e3a753
Address comments on Chebyshev evaluation code:
...
1. Use pmadd when possible.
2. Add casts to avoid c++03 warnings.
2019-10-02 12:48:17 -07:00
Eugene Zhulenev
60ae24ee1a
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect
2019-10-02 12:44:06 -07:00
Eugene Zhulenev
6e40454a6e
Add beta to TensorContractionKernel and make memset optional
2019-10-02 11:06:02 -07:00
Rasmus Munk Larsen
bd0fac456f
Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation.
2019-10-01 13:15:30 -07:00
Gael Guennebaud
9549ba8313
Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real)
2019-10-01 12:54:25 +02:00
Gael Guennebaud
c8b2c603b0
Fix speed issue with SimplicialLDLT for complexes: the diagonal is real!
2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac
Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h.
2019-09-27 13:56:04 -07:00
Eugene Zhulenev
7c8bc0d928
Fix cxx11_tensor_block_io test
2019-09-25 11:48:11 -07:00
Eugene Zhulenev
0c845e28c9
Fix erf in c++03
2019-09-25 11:31:45 -07:00
Eugene Zhulenev
71d5bedf72
Fix compilation warnings and errors with clang in TensorBlockV2
2019-09-25 11:25:22 -07:00
Deven Desai
5e186b1987
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : d38e6fbc27
After the above mentioned commit, some of the tests started failing with the following error
```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
return make_double2(erf(a.x), erf(a.y));
^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
erf(const Scalar& x) {
^
3 errors generated.
```
This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).
This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Eugene Zhulenev
f35b9ab510
Fix a bug in a packed block type in TensorContractionThreadPool
2019-09-24 16:54:36 -07:00
Rasmus Larsen
d38e6fbc27
Merged in rmlarsen/eigen (pull request PR-704)
...
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Rasmus Munk Larsen
591a554c68
Add TODO to cleanup FMA cost modelling.
2019-09-24 16:39:25 -07:00
Eugene Zhulenev
c64396b4c6
Choose TensorBlock StridedLinearCopy type statically
2019-09-24 16:04:29 -07:00
Eugene Zhulenev
c97b208468
Add new TensorBlock api implementation + tests
2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd
Tensor block evaluation V2 support for unary/binary/broadcsting
2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0
bug #1746 : Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types
2019-09-24 11:09:58 +02:00
Christoph Hertzberg
e4c1b3c1d2
Fix implicit conversion warnings and use pnegate to negate packets
2019-09-23 16:07:43 +02:00
Christoph Hertzberg
ba0736fa8e
Fix (or mask away) conversion warnings introduced in 553caeb6a3
...
.
2019-09-23 15:58:05 +02:00
Rasmus Munk Larsen
1d5af0693c
Add support for asynchronous evaluation of tensor casting expressions.
2019-09-19 13:54:49 -07:00
Rasmus Munk Larsen
6de5ed08d8
Add generic PacketMath implementation of the Error Function (erf).
2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498
Fix build on setups without AVX512DQ.
2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637
Fix for the HIP build+test errors.
...
The errors were introduced by this commit : 6e215cf109
The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
df0816b71f
Merging eigen/eigen.
2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109
Add Bessel functions to SpecialFunctions.
...
- Split SpecialFunctions files in to a separate BesselFunctions file.
In particular add:
- Modified bessel functions of the second kind k0, k1, k0e, k1e
- Bessel functions of the first kind j0, j1
- Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Eugene Zhulenev
7c73296849
Revert accidental change to GCC diagnostics
2019-09-13 14:30:58 -07:00
Eugene Zhulenev
bf8866b466
Fix maybe-unitialized warnings in TensorContractionThreadPool
2019-09-13 14:29:55 -07:00
Eugene Zhulenev
553caeb6a3
Use ThreadLocal container in TensorContractionThreadPool
2019-09-13 12:14:44 -07:00
Srinivas Vasudevan
facdec5aa7
Add packetized versions of i0e and i1e special functions.
...
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
- Move chebevl to GenericPacketMathFunctions.
A brief benchmark with building Eigen with FMA, AVX and AVX2 flags
Before:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 57.3 57.3 10000000
BM_eigen_i0e_double/8 398 398 1748554
BM_eigen_i0e_double/64 3184 3184 218961
BM_eigen_i0e_double/512 25579 25579 27330
BM_eigen_i0e_double/4k 205043 205042 3418
BM_eigen_i0e_double/32k 1646038 1646176 422
BM_eigen_i0e_double/256k 13180959 13182613 53
BM_eigen_i0e_double/1M 52684617 52706132 10
BM_eigen_i0e_float/1 28.4 28.4 24636711
BM_eigen_i0e_float/8 75.7 75.7 9207634
BM_eigen_i0e_float/64 512 512 1000000
BM_eigen_i0e_float/512 4194 4194 166359
BM_eigen_i0e_float/4k 32756 32761 21373
BM_eigen_i0e_float/32k 261133 261153 2678
BM_eigen_i0e_float/256k 2087938 2088231 333
BM_eigen_i0e_float/1M 8380409 8381234 84
BM_eigen_i1e_double/1 56.3 56.3 10000000
BM_eigen_i1e_double/8 397 397 1772376
BM_eigen_i1e_double/64 3114 3115 223881
BM_eigen_i1e_double/512 25358 25361 27761
BM_eigen_i1e_double/4k 203543 203593 3462
BM_eigen_i1e_double/32k 1613649 1613803 428
BM_eigen_i1e_double/256k 12910625 12910374 54
BM_eigen_i1e_double/1M 51723824 51723991 10
BM_eigen_i1e_float/1 28.3 28.3 24683049
BM_eigen_i1e_float/8 74.8 74.9 9366216
BM_eigen_i1e_float/64 505 505 1000000
BM_eigen_i1e_float/512 4068 4068 171690
BM_eigen_i1e_float/4k 31803 31806 21948
BM_eigen_i1e_float/32k 253637 253692 2763
BM_eigen_i1e_float/256k 2019711 2019918 346
BM_eigen_i1e_float/1M 8238681 8238713 86
After:
CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1 15.8 15.8 44097476
BM_eigen_i0e_double/8 99.3 99.3 7014884
BM_eigen_i0e_double/64 777 777 886612
BM_eigen_i0e_double/512 6180 6181 100000
BM_eigen_i0e_double/4k 48136 48140 14678
BM_eigen_i0e_double/32k 385936 385943 1801
BM_eigen_i0e_double/256k 3293324 3293551 228
BM_eigen_i0e_double/1M 12423600 12424458 57
BM_eigen_i0e_float/1 16.3 16.3 43038042
BM_eigen_i0e_float/8 30.1 30.1 23456931
BM_eigen_i0e_float/64 169 169 4132875
BM_eigen_i0e_float/512 1338 1339 516860
BM_eigen_i0e_float/4k 10191 10191 68513
BM_eigen_i0e_float/32k 81338 81337 8531
BM_eigen_i0e_float/256k 651807 651984 1000
BM_eigen_i0e_float/1M 2633821 2634187 268
BM_eigen_i1e_double/1 16.2 16.2 42352499
BM_eigen_i1e_double/8 110 110 6316524
BM_eigen_i1e_double/64 822 822 851065
BM_eigen_i1e_double/512 6480 6481 100000
BM_eigen_i1e_double/4k 51843 51843 10000
BM_eigen_i1e_double/32k 414854 414852 1680
BM_eigen_i1e_double/256k 3320001 3320568 212
BM_eigen_i1e_double/1M 13442795 13442391 53
BM_eigen_i1e_float/1 17.6 17.6 41025735
BM_eigen_i1e_float/8 35.5 35.5 19597891
BM_eigen_i1e_float/64 240 240 2924237
BM_eigen_i1e_float/512 1424 1424 485953
BM_eigen_i1e_float/4k 10722 10723 65162
BM_eigen_i1e_float/32k 86286 86297 8048
BM_eigen_i1e_float/256k 691821 691868 1000
BM_eigen_i1e_float/1M 2777336 2777747 256
This shows anywhere from a 50% to 75% improvement on these operations.
I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).
Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992
Merged eigen/eigen into default
2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb
Fix for the HIP build+test errors introduced by the ndtri support.
...
The fixes needed are
* adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
* switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
* removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca
bug #1736 : fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i)
2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d
bug #1741 : fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride
2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08
Fix compilation of BLAS backend and frontend
2019-09-11 10:02:37 +02:00
Rasmus Larsen
97f1e1d89f
Merged in ezhulenev/eigen-01 (pull request PR-698)
...
ThreadLocal container that does not rely on thread local storage
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-09-10 23:19:33 +00:00
Eugene Zhulenev
d918bd9a8b
Update ThreadLocal to use separate Initialize/Release callables
2019-09-10 16:13:32 -07:00
Gael Guennebaud
afa8d13532
Fix some implicit literal to Scalar conversions in SparseCore
2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115
bug #1741 : fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride
2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956
bug #1741 : fix C.noalias() = A*C; with C.innerStride()!=1
2019-09-10 16:25:24 +02:00
Eugene Zhulenev
e3dec4dcc1
ThreadLocal container that does not rely on thread local storage
2019-09-09 15:18:14 -07:00
Gael Guennebaud
17226100c5
Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
...
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3
Fix compilation without vector engine available (e.g., x86 with SSE disabled):
...
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7
Merged eigen/eigen
2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd
Fix doc issues regarding ndtri
2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926
Fix possible warning regarding strict equality comparisons
2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615
Merging from eigen/eigen.
2019-09-03 15:34:47 -04:00
Eugene Zhulenev
a8d264fa9c
Add test for const TensorMap underlying data mutation
2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f68f2bba09
TensorMap constness should not change underlying storage constness
2019-09-03 11:08:09 -07:00
Gael Guennebaud
8e7e3d9bc8
Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688)
2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27
PR 681: Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13
Change typedefs from private to protected to fix MSVC compilation
2019-09-03 19:11:36 -07:00
Eugene Zhulenev
47fefa235f
Allow move-only done callback in TensorAsyncDevice
2019-09-03 17:20:56 -07:00
Srinivas Vasudevan
18ceb3413d
Add ndtri function, the inverse of the normal distribution function.
2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b
Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
...
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47
Guard against repeated definition of EIGEN_MPL2_ONLY
2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536
Disable tests for contraction with output kernels when using libxsmm, which does not support this.
2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2
[Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
...
A few benchmark numbers:
name old time/op new time/op delta
BM_Xent_16_10000_cpu 448µs ± 3% 389µs ± 2% -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu 575µs ± 6% 454µs ± 3% -21.00% (p=0.008 n=5+5)
BM_Xent_64_10000_cpu 933µs ± 4% 712µs ± 1% -23.71% (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165
Clean up unnecessary namespace specifiers in TensorBlock.h.
2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23
Fix doc regarding alignment and c++17
2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38
Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662 .
...
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 128173 231326 2922 1.610G items/s
VS
Benchmark Time(ns) CPU(ns) Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4 146683 246914 2719 1.509G items/s
2019-08-02 11:18:13 -07:00
Alberto Luaces
c694be1214
Fixed Tensor documentation formatting.
2019-07-23 09:24:06 +00:00
Gael Guennebaud
15f3d9d272
More colamd cleanup:
...
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d
Eigen_Colamd.h updated to replace constexpr with consts and enums.
2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face
Ordering.h edited to fix dependencies on Eigen_Colamd.h
2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2
Eigen_Colamd.h edited replacing macros with constexprs and functions.
2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf
Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions
2019-07-21 04:53:31 +00:00
Kyle Vedder
f22b7283a3
Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code.
2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45
Fix typo in Umeyama method documentation
2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456
Remove {} accidentally added in previous commit
2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f
Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation.
2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1
Escape \# inside doxygen docu
2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315
Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
...
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983
Fix expression evaluation heuristic for TensorSliceOp
2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e
Fix compiler for unsigned integers.
2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594
Add outer/inner chipping optimization for chipping dimension specified at runtime
2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b
adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
...
Not having this attribute results in the following failures in the `--config=rocm` TF build.
```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
typename Storage::Type result = constCast(m_impl.data());
^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error: 'Eigen::constCast': no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());
```
Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89
Merged in codeplaysoftware/eigen (pull request PR-667)
...
[SYCL] :
Approved-by: Gael Guennebaud <g.gael@free.fr >
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc
Allocate non-const scalar buffer for block evaluation with DefaultDevice
2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c
[SYCL] :
...
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
* Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
* Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
* Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75
Fix TensorReverse on GPU with m_stride[i]==0
2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e
Fix CUDA compilation error for pselect<half>.
2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102
Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit.
2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen
70d4020ad9
Remove comma causing warning in c++03 mode.
2019-06-28 11:39:45 -07:00
Eugene Zhulenev
6e7c76481a
Merge with Eigen head
2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25
Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred
2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen
1f61aee5ca
[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
...
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:11:56 -07:00
Mehdi Goli
7d08fa805a
[SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
...
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:08:23 +01:00
Mehdi Goli
16a56b2ddd
[SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
...
* Adding SYCL memory model
* Enabling/Disabling SYCL backend in Core
* Supporting Vectorization
2019-06-27 12:25:09 +01:00
Christoph Hertzberg
adec097c61
Remove extra comma (causes warnings in C++03)
2019-06-26 16:14:28 +02:00
Eugene Zhulenev
229db81572
Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp
2019-06-25 15:41:37 -07:00
Deven Desai
ba506d5bd2
fix for a ROCm/HIP specificcompile errror introduced by a recent commit.
2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e
Remove extra "one" in comment.
2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb
Update comment as suggested by tra@google.com.
2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad
Fix grammar.
2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e
Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause.
2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1
Fix CUDA build on Mac.
2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730
Various fixes for packet ops.
...
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1
bug #1724 : Mask buggy warnings with g++-7
...
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Anshul Jaiswal
fab51d133e
Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
...
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Eugene Zhulenev
79c402e40e
Fix shadow warnings in TensorContractionThreadPool
2019-08-30 15:38:31 -07:00
Eugene Zhulenev
edf2ec28d8
Fix block mapper type name in TensorExecutor
2019-08-30 15:29:25 -07:00
Eugene Zhulenev
f0b36fb9a4
evalSubExprsIfNeededAsync + async TensorContractionThreadPool
2019-08-30 15:13:38 -07:00
Eugene Zhulenev
619cea9491
Revert accidentally removed <memory> header from ThreadPool
2019-08-30 14:51:17 -07:00
Eugene Zhulenev
66665e7e76
Asynchronous expression evaluation with TensorAsyncDevice
2019-08-30 14:49:40 -07:00
Rasmus Munk Larsen
f6c51d9209
Fix missing header inclusion and colliding definitions for half type casting, which broke
...
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Eugene Zhulenev
bc40d4522c
Const correctness in TensorMap<const Tensor<T, ...>> expressions
2019-08-28 17:46:05 -07:00
Rasmus Munk Larsen
1187bb65ad
Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf.
2019-08-28 12:20:21 -07:00
Eugene Zhulenev
6e77f9bef3
Remove shadow warnings in TensorDeviceThreadPool
2019-08-28 10:32:19 -07:00
Rasmus Munk Larsen
9aba527405
Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly.
2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d
Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories.
2019-08-27 11:30:31 -07:00
Rasmus Larsen
84fefdf321
Merged in ezhulenev/eigen-01 (pull request PR-683)
...
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-26 21:49:17 +00:00
maratek
8b5ab0e4dd
Fix get_random_seed on Native Client
...
Newlib in Native Client SDK does not provide ::random function.
Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
2019-08-23 15:25:56 -07:00
Eugene Zhulenev
6901788013
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-22 10:50:51 -07:00
Christoph Hertzberg
2fb24384c9
Merged in jaopaulolc/eigen (pull request PR-679)
...
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
Rasmus Larsen
57f6b62597
Merged in rmlarsen/eigen (pull request PR-680)
...
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
2019-08-22 00:25:29 +00:00
Eugene Zhulenev
071311821e
Remove XSMM support from Tensor module
2019-08-19 11:44:25 -07:00
João P. L. de Carvalho
5ac7984ffa
Fix debug macros in p{load,store}u
2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40
Add missing pcmp_XX methods for double/Packet2d
...
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
...
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)
The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.
Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025
Fix packed load/store for PowerPC's VSX
...
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.
For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.
Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294
Fix offset argument of ploadu/pstoreu for Altivec
...
If no offset is given, them it should be zero.
Also passes full address to vec_vsx_ld/st builtins.
Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.
Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e
bug #1718 : Add cast to successfully compile with clang on PowerPC
...
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen
6d432eae5d
Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off.
2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816
Add workaround for choosing the right include files with FP16C support with clang.
2019-06-05 13:36:37 -07:00
Justin Carpentier
ffaf658ecd
PR 655: Fix missing Eigen namespace in Macros
2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c
[SYCL] Adding the SYCL memory model. The SYCL memory model provides :
...
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
* an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Larsen
c1b0aea653
Merged in Artem-B/eigen (pull request PR-654)
...
Minor build improvements
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1
Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures.
2019-05-31 15:26:06 -07:00
tra
b4c49bf00e
Minor build improvements
...
* Allow specifying multiple GPU architectures. E.g.:
cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
5614400581
digits10() needs to return an integer
...
Problem reported on https://stackoverflow.com/questions/56395899
2019-05-31 15:45:41 +02:00
Rasmus Larsen
36e0a2b93f
Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649)
...
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 16:05:31 +00:00
Deven Desai
2c38930161
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb
GEMV: remove double declaration of constant.
...
That was hurting users with compilers that would object to proceed with
that:
"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
LhsPacketSize = Traits::LhsPacketSize,
^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13
Cast Index to RealScalar
...
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0
Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
...
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3
Merged in rmlarsen/eigen (pull request PR-643)
...
Make Eigen build with cuda 10 and clang.
Approved-by: Justin Lebar <justin.lebar@gmail.com >
2019-05-20 17:02:39 +00:00
Rasmus Munk Larsen
fd595d42a7
Merge
2019-05-20 09:39:11 -07:00
Gael Guennebaud
cc7ecbb124
Merged in scramsby/eigen (pull request PR-646)
...
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Eugene Zhulenev
01654d97fa
Prevent potential division by zero in TensorExecutor
2019-05-17 14:02:25 -07:00
Rasmus Larsen
78d3015722
Merged in ezhulenev/eigen-01 (pull request PR-644)
...
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-17 19:44:25 +00:00
Rasmus Larsen
bf9cbed8d0
Merged in glchaves/eigen (pull request PR-635)
...
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
Approved-by: Rasmus Larsen <rmlarsen@google.com >
2019-05-17 19:40:50 +00:00
Eugene Zhulenev
96a276803c
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-16 16:15:45 -07:00
Rasmus Munk Larsen
ab0a30e429
Make Eigen build with cuda 10 and clang.
2019-05-15 13:32:15 -07:00
Rasmus Munk Larsen
734a50dc60
Make Eigen build with cuda 10 and clang.
2019-05-15 13:32:15 -07:00
Rasmus Larsen
c8d8d5c0fc
Merged in rmlarsen/eigen_threadpool (pull request PR-640)
...
Fix deadlocks in thread pool.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2019-05-13 20:04:35 +00:00
Christoph Hertzberg
5f32b79edc
Collapsed revision from PR-641
...
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5
Removing unused API to fix compile error in TensorFlow due to
...
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92
bug #1707 : Fix deprecation warnings, or disable warnings when testing deprecated functions
2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e
Fix build with clang on Windows.
2019-05-09 11:07:04 -07:00
Rasmus Munk Larsen
e5ac8cbd7a
A) fix deadlocks in thread pool caused by EventCount
...
This fixed 2 deadlocks caused by sloppiness in the EventCount logic.
Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm:
01da8caf00
bug #1 (Prewait):
Prewait must not consume existing signals.
Consider the following scenario.
There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty.
Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait.
Thread 2 checks the queue and now is going to call Prewait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Now thread 2 resumes and calls Prewait and takes away the signal.
Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks.
As the result we have 2 tasks, but only 1 thread is running.
bug #2 (CancelWait):
CancelWait must not take away a signal if it's not sure that the signal was meant for this thread.
When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm):
(a) the registered waiter notices presence of the new task and does not block
(b) the signaler notices presence of the waiters and wakes it
(c) both the waiter notices presence of the new task and signaler notices presence of the waiter
[it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock]
CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else.
Consider:
Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1).
Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks.
As the result we have 2 tasks, but only 1 thread is running.
Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2.
This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running.
B) fix deadlock in thread pool caused by RunQueue
This fixed a deadlock caused by sloppiness in the RunQueue logic.
Most likely this was introduced with the non-blocking thread pool.
The deadlock only affects workloads that require parallelism.
Most computational tasks don't require parallelism.
PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals.
Consider 2 worker threads are blocked.
External thread submits a task. One of the threads is woken.
It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock).
The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait).
Now external thread submits another task and signals EventCount again.
The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running.
It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug.
It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
2019-05-08 10:16:46 -07:00
Michael Tesch
c5019f722b
Use pade for matrix exponential also for complex values.
2019-05-08 17:04:55 +02:00
Eugene Zhulenev
45b40d91ca
Fix AVX512 & GCC 6.3 compilation
2019-05-07 16:44:55 -07:00
Christoph Hertzberg
e6667a7060
Fix stupid shadow-warnings (with old clang versions)
2019-05-07 18:32:19 +02:00
Christoph Hertzberg
e54dc24d62
Restore C++03 compatibility
2019-05-07 18:30:44 +02:00
Christoph Hertzberg
cca76c272c
Restore C++03 compatibility
2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7
Fix traits for scalar_logistic_op.
2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584
Eigen: Fix MSVC C++17 language standard detection logic
...
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e
Add masked_store_available to unpacket_traits
2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a
Add masked pstoreu for Packet16h
2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9
Add masked pstoreu to AVX and AVX512 PacketMath
2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f
Fix regression in changeset ae33e866c7
2019-05-02 15:45:21 +02:00
Rasmus Larsen
ac50afaffa
Merged in ezhulenev/eigen-01 (pull request PR-633)
...
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-29 16:29:35 +00:00
Gustavo Lima Chaves
d4dcb71bcb
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
...
We take advantage of smaller SIMD registers as well, in that case.
Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7
Fix compilation with PGI version 19
2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6
Merged in ezhulenev/eigen-01 (pull request PR-632)
...
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
01d7e6ee9b
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-25 11:19:17 -07:00
Eugene Zhulenev
8ead5bb3d8
Fix doxygen warnings to enable statis code analysis
2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6
Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h
2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321
Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings.
2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a
Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings
2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445
Use packet ops instead of AVX2 intrinsics
2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e
Adding lowlevel APIs for optimized RHS packet load in TensorFlow
...
SpatialConvolution
Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.
This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.
Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim
Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.
AVX512:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128, 24x24, 1, 64, 8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
32, 24x24, 3, 64, 5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128, 24x24, 3, 64, 3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
32, 14x14, 24, 64, 5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128, 3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X
AVX2:
Parameters | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128, 24x24, 3, 64, 5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
32, 24x24, 3, 64, 5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128, 24x24, 1, 64, 5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128, 24x24, 3, 64, 3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128, 3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X
In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).
On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:
AVX512:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 41350 | 15073 | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 7277 | 7341 | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 8675 | 8681 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 24155 | 16079 | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 25052 | 17152 | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 18269 | 18345 | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 19468 | 19872 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 156060 | 42432 | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 132701 | 36944 | 3.59X
AVX2:
Parameters | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56) | 26233 | 12393 | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56) | 6091 | 6062 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56) | 7427 | 7408 | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56) | 23453 | 20826 | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56) | 23167 | 22091 | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422 | 23682 | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165 | 23663 | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432) | 72689 | 44969 | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432) | 61732 | 39779 | 1.55X
All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Christoph Hertzberg
4270c62812
Split the implementation of i?amax/min into two. Based on PR-627 by Sameer Agarwal.
...
Like the Netlib reference implementation, I*AMAX now uses the L1-norm instead of the L2-norm for each element. Changed I*MIN accordingly.
2019-04-15 17:18:03 +02:00
Rasmus Munk Larsen
039ee52125
Tweak cost model for tensor contraction when parallelizing over the inner dimension.
...
https://bitbucket.org/snippets/rmlarsen/MexxLo
2019-04-12 13:35:10 -07:00
Jonathon Koyle
9a3f06d836
Update TheadPoolDevice example to include ThreadPool creation and passing pointer into constructor.
2019-04-10 10:02:33 -06:00
Deven Desai
66a885b61e
adding EIGEN_DEVICE_FUNC to the recently added TensorContractionKernel constructor. Not having the EIGEN_DEVICE_FUNC attribute on it was leading to compiler errors when compiling Eigen in the ROCm/HIP path
2019-04-08 13:45:08 +00:00
Eugene Zhulenev
629ddebd15
Add missing semicolon
2019-04-02 15:04:26 -07:00
Eugene Zhulenev
4e2f6de1a8
Add support for custom packed Lhs/Rhs blocks in tensor contractions
2019-04-01 11:47:31 -07:00
Gael Guennebaud
45e65fbb77
bug #1695 : fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign.
2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9
Collapsed revision from PR-619
...
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double
The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e
ICC does not support -fno-unsafe-math-optimizations
2019-03-22 09:26:24 +01:00
David Tellenbach
3031d57200
PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN
2019-03-21 02:21:04 +01:00
Deven Desai
51e399fc15
updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16
2019-03-19 21:45:25 +00:00
Deven Desai
2dbea5510f
Merged eigen/eigen into default
2019-03-19 16:52:38 -04:00
Rasmus Larsen
5c93b38c5f
Merged in rmlarsen/eigen (pull request PR-618)
...
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-03-18 15:51:55 +00:00
Gael Guennebaud
48898a988a
fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param
2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f
bug #1692 : enable enum as sizes of Matrix and Array
2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen
e42f9aa68a
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>.
2019-03-15 17:15:14 -07:00
Rasmus Larsen
1936aac43f
Merged in tellenbach/eigen/sykline_consistent_include_guards (pull request PR-617)
...
Fix include guard comments for Skyline module
2019-03-15 20:04:56 +00:00
David Tellenbach
bd9c2ae3fd
Fix include guard comments
2019-03-15 15:29:17 +01:00
Rasmus Munk Larsen
8450a6d519
Clean up half packet traits and add a few more missing packet ops.
2019-03-14 15:18:06 -07:00
David Tellenbach
b013176e52
Remove undefined std::complex<int>
2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9
PR 593: Add variadtic ctor for DiagonalMatrix with unit tests
2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2
revert debug stuff
2019-03-14 10:08:12 +01:00
Rasmus Munk Larsen
6a34003141
Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2.
2019-03-13 11:52:41 -07:00
Gael Guennebaud
d7d2f0680e
bug #1684 : partially workaround clang's 6/7 bug #40815
2019-03-13 10:40:01 +01:00
Rasmus Larsen
690f0795d0
Merged in rmlarsen/eigen (pull request PR-615)
...
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-12 16:09:48 +00:00
Thomas Capricelli
1901433674
erm.. use proper id
2019-03-12 13:53:38 +01:00
Thomas Capricelli
90302aa8c9
update tracking code
2019-03-12 13:47:01 +01:00
Rasmus Munk Larsen
77f7d4a894
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-11 17:51:16 -07:00
Eugene Zhulenev
001f10e3c9
Fix segfaults with cuda compilation
2019-03-11 09:43:33 -07:00
Eugene Zhulenev
899c16fa2c
Fix a bug in TensorGenerator for 1d tensors
2019-03-11 09:42:01 -07:00
Eugene Zhulenev
0f8bfff23d
Fix a data race in NonBlockingThreadPool
2019-03-11 09:38:44 -07:00
Gael Guennebaud
656d9bc66b
Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax
2019-03-10 21:19:18 +01:00
Gael Guennebaud
2df4f00246
Change license from LGPL to MPL2 with agreement from David Harmon.
2019-03-07 18:17:10 +01:00
Rasmus Munk Larsen
3c3f639fe2
Merge.
2019-03-06 11:54:30 -08:00
Rasmus Munk Larsen
f4ec8edea8
Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local.
2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d
Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
...
Found with -Wundefined-func-template.
Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c
Add missing return to NonBlockingThreadPool::LocalSteal
2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026
Remove redundant steal loop
2019-03-06 10:39:07 -08:00
Rasmus Larsen
4d808e834a
Merged in rmlarsen/eigen_threadpool (pull request PR-606)
...
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
Approved-by: Sameer Agarwal <sameeragarwal@google.com >
2019-03-06 17:59:03 +00:00
Rasmus Larsen
2ea18e505f
Merged in ezhulenev/eigen-01 (pull request PR-610)
...
Block evaluation for TensorGeneratorOp
2019-03-06 16:49:38 +00:00
Eugene Zhulenev
25abaa2e41
Check that inner block dimension is continuous
2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed
Block evaluation for TensorGeneratorOp
2019-03-05 16:35:21 -08:00
Rasmus Larsen
b4861f4778
Merged in ezhulenev/eigen-01 (pull request PR-609)
...
Tune tensor contraction threadpool heuristics
2019-03-05 23:54:40 +00:00
Gael Guennebaud
bfbf7da047
bug #1689 fix used-but-marked-unused warning
2019-03-05 23:46:24 +01:00
Eugene Zhulenev
a407e022e6
Tune tensor contraction threadpool heuristics
2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82
Add an extra check for the RunQueue size estimate
2019-03-05 11:51:26 -08:00
Eugene Zhulenev
b1a8627493
Do not create Tensor<const T> in cxx11_tensor_forced_eval test
2019-03-05 11:19:25 -08:00
Rasmus Munk Larsen
0318fc7f44
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239
2019-03-05 10:24:54 -08:00
Eugene Zhulenev
efb5080d31
Do not initialize invalid fast_strides in TensorGeneratorOp
2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2
Add tiled evaluation for TensorForcedEvalOp
2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd
Use fast divisors in TensorGeneratorOp
2019-03-04 11:10:21 -08:00
Gael Guennebaud
b0d406d91c
Enable construction of Ref<VectorType> from a runtime vector.
2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff
Fully qualify Eigen::internal::aligned_free
...
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d
bug #1629 : fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
...
)
2019-03-02 22:44:47 +01:00
Bernhard M. Wiedemann
b071672e78
Do not keep latex logs
...
to make package builds more reproducible.
See https://reproducible-builds.org/ for why this is good.
2019-02-27 11:09:00 +01:00
Rasmus Munk Larsen
cf4a1c81fa
Fix specialization for conjugate on non-complex types in TensorBase.h.
2019-03-01 14:21:09 -08:00
Sameer Agarwal
c181dfb8ab
Consistently use EIGEN_BLAS_FUNC in BLAS.
...
Previously, for a few functions, eithe BLASFUNC or, EIGEN_CAT
was being used. This change uses EIGEN_BLAS_FUNC consistently
everywhere.
Also introduce EIGEN_BLAS_FUNC_SUFFIX, which by default is
equal to "_", this allows the user to inject a new suffix as
needed.
2019-02-27 11:30:58 -08:00
Rasmus Larsen
9558f4c25f
Merged in rmlarsen/eigen_threadpool (pull request PR-596)
...
Improve EventCount used by the non-blocking threadpool.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-26 20:37:26 +00:00
Rasmus Larsen
2ca1e73239
Merged in rmlarsen/eigen (pull request PR-597)
...
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14
Enable SSE vectorization of Quaternion and cross3() with AVX
2019-02-23 10:45:40 +01:00
Rasmus Munk Larsen
6560692c67
Improve EventCount used by the non-blocking threadpool.
...
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
0b25a5c431
fix alignment in ploadquad
2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2.
2019-02-22 12:33:57 -08:00
Gael Guennebaud
0cb4ba98e7
update wrt recent changes
2019-02-21 17:19:36 +01:00
Gael Guennebaud
cca6c207f4
AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM
2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541
bug #1674 : workaround clang fast-math aggressive optimizations
2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f
Fix compilation on ARM.
2019-02-22 14:56:12 +01:00
Gael Guennebaud
32502f3c45
bug #1684 : add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang)
2019-02-22 10:29:06 +01:00
Gael Guennebaud
42c23f14ac
Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes.
2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102
Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU.
2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79
Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases.
2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a
Update documentation of Matrix and Array type aliases.
2019-02-20 15:18:48 +01:00
Gael Guennebaud
475295b5ff
Enable documentation of Array's typedefs
2019-02-20 15:18:07 +01:00
Gael Guennebaud
44b54fa4a3
Protect c++11 type alias with Eigen's macro, and add respective unit test.
2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce
Merged in ra_bauke/eigen (pull request PR-180)
...
alias template for matrix and array classes, see also bug #864
Approved-by: Heiko Bauke <heiko.bauke@mail.de >
2019-02-20 13:22:39 +00:00
Gael Guennebaud
4e8047cdcf
Fix compilation with gcc and remove TR1 stuff.
2019-02-20 13:59:34 +01:00
Gael Guennebaud
844e5447f8
Update documentation regarding alignment issue.
2019-02-20 13:54:04 +01:00
Gael Guennebaud
edd413c184
bug #1409 : make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
...
- this helps clang 5 and 6 to support alignas in STL's containers.
- this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546
bug #899 : make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems.
2019-02-19 22:57:51 +01:00
Gael Guennebaud
482c5fb321
bug #899 : remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing.
2019-02-19 22:52:15 +01:00
Gael Guennebaud
9ac1634fdf
Fix conversion warnings
2019-02-19 21:59:53 +01:00
Gael Guennebaud
292d61970a
Fix C++17 compilation
2019-02-19 21:59:41 +01:00
Rasmus Munk Larsen
071629a440
Fix incorrect value of NumDimensions in TensorContraction traits.
...
Reported here: #1671
2019-02-19 10:49:54 -08:00
Christoph Hertzberg
a1646fc960
Commas at the end of enumerator lists are not allowed in C++03
2019-02-19 14:32:25 +01:00
Gael Guennebaud
2cfc025bda
fix unit compilation in c++17: std::ptr_fun has been removed.
2019-02-19 14:05:22 +01:00
Gael Guennebaud
ab78cabd39
Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode.
2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea
Fix conversion warnings
2019-02-19 14:00:15 +01:00
Gael Guennebaud
7d10c78738
bug #1046 : add unit tests for correct propagation of alignment through std::alignment_of
2019-02-19 10:31:56 +01:00
Gael Guennebaud
7580112c31
Fix harmless Scalar vs RealScalar cast.
2019-02-18 22:12:28 +01:00
Gael Guennebaud
e23bf40dc2
Add unit test for LinSpaced and complex numbers.
2019-02-18 22:03:47 +01:00
Gael Guennebaud
796db94e6e
bug #1194 : implement slightly faster and SIMD friendly 4x4 determinant.
2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9
Fix regression: .conjugate() was popped out but not re-introduced.
2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0
Set cost of conjugate to 0 (in practice it boils down to a no-op).
...
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1
GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
...
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495 ).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b
Guard C++11-style default constructor. Also, this is only needed for MSVC
2019-02-16 09:44:05 +01:00
Gael Guennebaud
902a7793f7
Add possibility to bench row-major lhs and rhs
2019-02-15 16:52:34 +01:00
Gael Guennebaud
83309068b4
bug #1680 : improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE.
2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25
bug #1680 : make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC)
2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e
bug #1678 : Fix lack of __FMA__ macro on MSVC with AVX512
2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf
bug #1678 : workaround MSVC compilation issues with AVX512
2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2
bug #1679 : avoid possible division by 0 in complex-schur
2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9
Revert b55b5c7280
...
.
2019-02-14 13:46:13 -08:00
Rasmus Larsen
efeabee445
Merged in ezhulenev/eigen-01 (pull request PR-590)
...
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7
Fix signed-unsigned return in RuqQueue
2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265
Fix signed-unsigned comparison warning in RunQueue
2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a
Do not generate no-op cast() and conjugate() expressions
2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790
Speedup Tensor ThreadPool RunQueu::Empty()
2019-02-13 10:20:53 -08:00
Gael Guennebaud
bdcb5f3304
Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...)
2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d
Fix compilation of empty products of the form: Mx0 * 0xN
2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c
Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
...
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
dada863d23
Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!)
2019-02-11 17:56:20 +01:00
Gael Guennebaud
ab6e6edc32
Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
...
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
| ------ float --------- | ------- double ------- |
size | before after ratio | before after ratio |
fixed 1 | 0.34 0.11 2.93 | 0.35 0.11 3.06 |
fixed 2 | 0.81 0.24 3.38 | 0.91 0.25 3.60 |
fixed 3 | 1.49 0.49 3.04 | 1.68 0.55 3.01 |
fixed 4 | 2.31 0.70 3.28 | 2.45 1.08 2.27 |
fixed 5 | 3.49 1.11 3.13 | 3.84 2.24 1.71 |
fixed 6 | 4.76 1.64 2.88 | 4.87 2.84 1.71 |
dyn 1 | 0.50 0.40 1.23 | 0.51 0.40 1.26 |
dyn 2 | 1.08 0.85 1.27 | 1.04 0.69 1.49 |
dyn 3 | 1.76 1.26 1.40 | 1.84 1.14 1.60 |
dyn 4 | 2.57 1.75 1.46 | 2.67 1.66 1.60 |
dyn 5 | 3.80 2.64 1.43 | 4.00 2.48 1.61 |
dyn 6 | 5.06 3.43 1.47 | 5.15 3.21 1.60 |
2019-02-11 13:58:24 +01:00
Eugene Zhulenev
21eb97d3e0
Add PacketConv implementation for non-vectorizable src expressions
2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1
Optimize TensorConversion evaluator: do not convert same type
2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f
Spline.h: fix spelling "spang" -> "span"
2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb
Don't do parallel_pack if we can use thread_local memory in tensor contractions
2019-02-07 09:21:25 -08:00
Gael Guennebaud
013cc3a6b3
Make GEMM fallback to GEMV for runtime vectors.
...
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895
Backed out changeset 4c0fa6ce0f
2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68
bug #1676 : workaround GCC's bug in c++17 mode.
2019-02-07 15:21:35 +01:00
Rasmus Larsen
3091c03898
Merged in ezhulenev/eigen-01 (pull request PR-581)
...
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing
Approved-by: Rasmus Larsen <rmlarsen@google.com >
Approved-by: Gael Guennebaud <g.gael@free.fr >
2019-02-05 22:45:20 +00:00
Eugene Zhulenev
8491127082
Do not reduce parallelism too much in contractions with small number of threads
2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing
2019-02-04 10:43:16 -08:00
Eugene Zhulenev
6d0f6265a9
Remove duplicated comment line
2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1
Fix GeneralBlockPanelKernel Android compilation
2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339
bug #1674 : disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise)
2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74
Merged in rmlarsen/eigen (pull request PR-578)
...
Speed up Eigen matrix*vector and vector*matrix multiplication.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280
Speed up row-major matrix-vector product on ARM
...
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.
This change introduces two modifications to this logic.
1. A smaller threshold for ARM and ARM64 devices.
The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.
On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.
The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.
With the current change, Eigen handily beats that code.
2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.
Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f
Speed up Eigen matrix*vector and vector*matrix multiplication.
...
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.
The benchmarks below test
c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.
Benchmark measurements:
SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 1096 312 +71.5%
BM_MatVec/128 4581 1464 +68.0%
BM_MatVec/256 18534 5710 +69.2%
BM_MatVec/512 118083 24162 +79.5%
BM_MatVec/1k 704106 173346 +75.4%
BM_MatVec/2k 3080828 742728 +75.9%
BM_MatVec/4k 25421512 4530117 +82.2%
BM_VecMat/32 352 130 +63.1%
BM_VecMat/64 1213 425 +65.0%
BM_VecMat/128 4640 1564 +66.3%
BM_VecMat/256 17902 5884 +67.1%
BM_VecMat/512 70466 24000 +65.9%
BM_VecMat/1k 340150 161263 +52.6%
BM_VecMat/2k 1420590 645576 +54.6%
BM_VecMat/4k 8083859 4364327 +46.0%
AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 619 120 +80.6%
BM_MatVec/128 9693 752 +92.2%
BM_MatVec/256 38356 2773 +92.8%
BM_MatVec/512 69006 12803 +81.4%
BM_MatVec/1k 443810 160378 +63.9%
BM_MatVec/2k 2633553 646594 +75.4%
BM_MatVec/4k 16211095 4327148 +73.3%
BM_VecMat/64 925 227 +75.5%
BM_VecMat/128 3438 830 +75.9%
BM_VecMat/256 13427 2936 +78.1%
BM_VecMat/512 53944 12473 +76.9%
BM_VecMat/1k 302264 157076 +48.0%
BM_VecMat/2k 1396811 675778 +51.6%
BM_VecMat/4k 8962246 4459010 +50.2%
AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64 401 111 +72.3%
BM_MatVec/128 1846 513 +72.2%
BM_MatVec/256 36739 1927 +94.8%
BM_MatVec/512 54490 9227 +83.1%
BM_MatVec/1k 487374 161457 +66.9%
BM_MatVec/2k 2016270 643824 +68.1%
BM_MatVec/4k 13204300 4077412 +69.1%
BM_VecMat/32 324 106 +67.3%
BM_VecMat/64 1034 246 +76.2%
BM_VecMat/128 3576 802 +77.6%
BM_VecMat/256 13411 2561 +80.9%
BM_VecMat/512 58686 10037 +82.9%
BM_VecMat/1k 320862 163750 +49.0%
BM_VecMat/2k 1406719 651397 +53.7%
BM_VecMat/4k 7785179 4124677 +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf
GEBP: improves pipelining in the 1pX4 path with FMA.
...
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00
Gael Guennebaud
de77bf5d6c
Fix compilation with ARM64.
2019-01-30 16:48:20 +01:00
Gael Guennebaud
d586686924
Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper.
2019-01-30 16:48:01 +01:00
Gael Guennebaud
eb4c6bb22d
Fix conflicts and merge
2019-01-30 15:57:08 +01:00
Gael Guennebaud
e3622a0396
Slightly extend discussions on auto and move the content of the Pit falls wiki page here.
...
http://eigen.tuxfamily.org/index.php?title=Pit_Falls
2019-01-30 13:09:21 +01:00
Gael Guennebaud
df12fae8b8
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 , the previous GCC issue is fixed in GCC trunk (will be gcc 9).
2019-01-30 11:52:28 +01:00
Gael Guennebaud
3775926bba
ARM64 & GEBP: add specialization for double +30% speed up
2019-01-30 11:49:06 +01:00
Gael Guennebaud
be5b0f664a
ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM
2019-01-30 11:48:25 +01:00
Christoph Hertzberg
a7779a9b42
Hide some annoying unused variable warnings in g++8.1
2019-01-29 16:48:21 +01:00
Gael Guennebaud
efe02292a6
Add recent gemm related changesets and various cleanups in perf-monitoring
2019-01-29 11:53:47 +01:00
Gael Guennebaud
8a06c699d0
bug #1669 : fix PartialPivLU/inverse with zero-sized matrices.
2019-01-29 10:27:13 +01:00
Gael Guennebaud
a2a07e62b9
Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected.
2019-01-29 10:10:07 +01:00
Gael Guennebaud
f489f44519
bug #1574 : implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs.
2019-01-28 17:29:50 +01:00
Gael Guennebaud
803fa79767
Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion
2019-01-28 17:24:44 +01:00
Gael Guennebaud
53560f9186
bug #1672 : fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro)
2019-01-28 13:47:28 +01:00
Christoph Hertzberg
c9825b967e
Renaming even more I identifiers
2019-01-26 13:22:13 +01:00
Christoph Hertzberg
5a52e35f9a
Renaming some more I identifiers
2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen
71429883ee
Fix compilation error in NEON GEBP specializaition of madd.
2019-01-25 17:00:21 -08:00
Christoph Hertzberg
934b8a1304
Avoid I as an identifier, since it may clash with the C-header complex.h
2019-01-25 14:54:39 +01:00
Gael Guennebaud
ec8a387972
cleanup
2019-01-24 10:24:45 +01:00
Gael Guennebaud
6908ce2a15
More thoroughly check variadic template ctor of fixed-size vectors
2019-01-24 10:24:28 +01:00
David Tellenbach
237b03b372
PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients.
2019-01-23 00:07:19 +01:00
Christoph Hertzberg
bd6dadcda8
Tell doxygen that cxx11 math is available
2019-01-24 00:14:02 +01:00
Gael Guennebaud
c64d5d3827
Bypass inline asm for non compatible compilers.
2019-01-23 23:43:13 +01:00
Christoph Hertzberg
e16913a45f
Fix name of tutorial snippet.
2019-01-23 10:35:06 +01:00
Gael Guennebaud
80f81f9c4b
Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing.
2019-01-22 17:08:47 +01:00
David Tellenbach
db152b9ee6
PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
...
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a
Add more extensive tests of Array ctors, including {} variants
2019-01-22 15:30:50 +01:00
nluehr
92774f0275
Replace host_define.h with cuda_runtime_api.h
2019-01-18 16:10:09 -06:00
Gael Guennebaud
d18f49cbb3
Fix compilation of unit tests with gcc and c++17
2019-01-18 11:12:42 +01:00
Christoph Hertzberg
da0a41b9ce
Mask unused-parameter warnings, when building with NDEBUG
2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen
2eccbaf3f7
Add missing logical packet ops for GPU and NEON.
2019-01-17 17:45:08 -08:00
Christoph Hertzberg
d575505d25
After fixing bug #1557 , boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success)
2019-01-17 19:14:07 +01:00
Gael Guennebaud
ee3662abc5
Remove some useless const_cast
2019-01-17 18:27:49 +01:00
Gael Guennebaud
0fe6b7d687
Make nestByValue works again (broken since 3.3) and add unit tests.
2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82
Extend reshaped unit tests and remove useless const_cast
2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1
Cleanup useless const_cast and add missing broadcast assignment tests
2019-01-17 16:55:42 +01:00
Gael Guennebaud
be05d0030d
Make FullPivLU use conjugateIf<>
2019-01-17 12:01:00 +01:00
Patrick Peltzer
bba2f05064
Boosttest only available for Boost version >= 1.53.0
2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93
PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
...
This changeset also includes:
* add HouseholderSequence::conjugateIf
* define int as the StorageIndex type for all dense solvers
* dedicated unit tests, including assertion checking
* _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
* CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
* Cholesky: add missing assertions
* FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
* BDCSVD: Unambiguous return type for ternary operator
* SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11
Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it.
2019-01-17 11:33:43 +01:00
Gael Guennebaud
7b35c26b1c
Doc: remove link to porting guide
2019-01-17 10:35:50 +01:00
Gael Guennebaud
4759d9e86d
Doc: add manual page on STL iterators
2019-01-17 10:35:14 +01:00
Gael Guennebaud
562985bac4
bug #1646 : fix false aliasing detection for A.row(0) = A.col(0);
...
This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen
7401e2541d
Fix compilation error for logical packet ops with older compilers.
2019-01-16 14:43:33 -08:00
Rasmus Munk Larsen
ee550a2ac3
Fix flaky test for tensor fft.
2019-01-16 14:03:12 -08:00
Gael Guennebaud
0f028f61cb
GEBP: fix swapped kernel mode with AVX512 and complex scalars
2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd
GEBP: cleanup logic to choose between a 4 packets of 1 packet
2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d
bug #1661 : fix regression in GEBP and AVX512
2019-01-16 21:22:20 +01:00
Gael Guennebaud
ce88e297dc
Add a comment stating this doc page is partly obsolete.
2019-01-16 16:29:02 +01:00
Gael Guennebaud
729d1291c2
bug #1585 : update doc on lazy-evaluation
2019-01-16 16:28:17 +01:00
Gael Guennebaud
c8e40edac9
Remove Eigen2ToEigen3 migration page (obsolete since 3.3)
2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e
bug #1617 : add unit tests for empty triangular solve.
2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980
bug #1646 : disable aliasing detection for empty and 1x1 expression
2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933
bug #1633 : use proper type for madd temporaries, factorize RhsPacketx4.
2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5
Bug: 1633: refactor gebp kernel and optimize for neon
2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570
Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry
2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995
Silent maybe-uninitialized warnings by gcc
2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d
Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests)
2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24
bug #1592 : makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests)
2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e
Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
...
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec
fix always true warning with gcc 4.7
2019-01-15 11:18:48 +01:00
Gael Guennebaud
6cf7afa3d9
Typo
2019-01-15 11:04:37 +01:00
Gael Guennebaud
e7d4d4f192
cleanup
2019-01-15 10:51:03 +01:00
Rasmus Larsen
7b3aab0936
Merged in rmlarsen/eigen (pull request PR-570)
...
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Rasmus Munk Larsen
8bf00c2baf
Remove extra <tr>.
2019-01-14 13:29:29 -08:00
Rasmus Munk Larsen
ec7fe83554
Merge.
2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2ea4efc0c3
Merge.
2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2c5843dbbb
Update documentation.
2019-01-14 13:26:34 -08:00
Gael Guennebaud
250dcd1fdb
bug #1652 : fix position of EIGEN_ALIGN16 attributes in Neon and Altivec
2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae
Merged eigen/eigen into default
2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d
AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers
2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe
AVX512 (r)sqrt(double) was mistakenly disabled with clang and others
2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90
fix warning
2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3
Doc: add Isometry in the list of supported Mode of Transform<>
2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a
Initialize isometric transforms like affine transforms.
...
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61
PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
...
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023
Fix StorageIndex FIXME in dense LU solvers
2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f
Merge.
2019-01-11 17:47:11 -08:00
Rasmus Larsen
0ba3b45419
Merged eigen/eigen into default
2019-01-11 17:46:04 -08:00
Rasmus Munk Larsen
28ba1b2c32
Add support for inverse hyperbolic functions.
...
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f
Fix warnings in ptrue for complex and half types.
2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba
Fix warnings in ptrue for complex and half types.
2019-01-11 13:18:17 -08:00
Eugene Zhulenev
1e6d15b55b
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-11 11:41:53 -08:00
Rasmus Munk Larsen
df29511ac0
Fix merge.
2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
8e71ed4cc9
Merge.
2019-01-11 10:35:07 -08:00
Rasmus Munk Larsen
fff5a5b579
Resolve.
2019-01-11 10:28:52 -08:00
Rasmus Munk Larsen
9396ace46b
Merge.
2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0
Merged eigen/eigen into default
2019-01-11 10:20:55 -08:00
Rasmus Munk Larsen
e9936cf2b9
Merge.
2019-01-11 09:58:33 -08:00
Gael Guennebaud
9005f0111f
Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7.
2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598
Remove reinterpret_cast from AVX512 complex implementation
...
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly. They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master. This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Christoph Hertzberg
0522460a0d
bug #1656 : Enable failtests only if BUILD_TESTING is enabled
2019-01-11 11:07:56 +01:00
Eugene Zhulenev
0abe03764c
Fix shorten-64-to-32 warning in TensorContractionThreadPool
2019-01-10 10:27:55 -08:00
Rasmus Munk Larsen
fcfced13ed
Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
ce38c342c3
merge.
2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
a05ec7993e
merge
2019-01-09 17:17:30 -08:00
Rasmus Munk Larsen
e15bb785ad
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5
Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f178429b9
Collapsed revision
...
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
1119c73d22
Collapsed revision
...
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514
Undo useless diffs.
2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8
Simplify a bit.
2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6
Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4
Merged eigen/eigen into default
2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3
bug #1654 : fix compilation with cuda and no c++11
2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74
fix plog(+inf) with AVX512
2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7
Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE
2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e
fix warning
2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b
Add missing pcmp_lt and others for AVX512
2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd
bug #1652 : implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
...
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
- FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Eugene Zhulenev
e70ffef967
Optimize evalShardedByInnerDim
2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
055f0b73db
Add support for pcmp_eq and pnot, including for complex types.
2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41
Explicitly set fill character when printing aligned data to ostream
2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd
PR560: Fix the AVX512f only builds
...
Commit c53eececb0
introduced AVX512 support for complex numbers but required
avx512dq to build. Commit 1d683ae2f5
fixed some but not, it would seem all,
of the hard avx512dq dependencies. Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3. Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps. This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
697fba3bb0
Fix unit test
2018-12-27 11:20:47 +01:00
Gael Guennebaud
60d3fe9a89
One more stupid AVX 512 fix (I don't have direct access to AVX512 machines)
2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510
Add EIGEN_STRONG_INLINE where required
2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8
Add missing pcmp_lt_or_nan for AVX512
2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a
Implement a faster fix for sin/cos of large entries that also correctly handle INF input.
2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8
Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate)
2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb
Fix plog(+INF): it returned ~87 instead of +INF
2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b
Make code compile again for older compilers.
...
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves
1024a70e82
gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
...
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:
i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
and SSE-sized SIMD register for x86)
ii) The matrix's height is favorable to it (it may be it's too small
in that dimension to take full advantage of the current/maximum
packet width or it may be the case that last rows may take
advantage of smaller packets before gebp goes row-by-row)
This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.
Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves
e763fcd09e
Introducing "vectorized" byte on unpacket_traits structs
...
This is a preparation to a change on gebp_traits, where a new template
argument will be introduced to dictate the packet size, so it won't be
bound to the current/max packet size only anymore.
By having packet types defined early on gebp_traits, one has now to
act on packet types, not scalars anymore, for the enum values defined
on that class. One approach for reaching the vectorizable/size
properties one needs there could be getting the packet's scalar again
with unpacket_traits<>, then the size/Vectorizable enum entries from
packet_traits<>. It turns out guards like "#ifndef
EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet
variations of packet_traits<> for some types (and it makes sense to
keep that). In other words, one can't go back to the scalar and create
a new PacketType, as this will always lead to the maximum packet type
for the architecture.
The less costly/invasive solution for that, thus, is to add the
vectorizable info on every unpacket_traits struct as well.
2018-12-19 14:24:44 -08:00
Gael Guennebaud
efa4c9c40f
bug #1615 : slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
...
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f20c991679
add changesets related to matrix product perf.
2018-12-13 10:33:29 +01:00
Rasmus Munk Larsen
dd6d65898a
Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0.
2018-12-12 14:45:31 -08:00
Gael Guennebaud
f582ea3579
Fix compilation with expression template scalar type.
2018-12-12 22:47:00 +01:00
Gael Guennebaud
cfc70dc13f
Add regression test for bug #1174
2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd
bug #1557 : fix RealSchur and EigenSolver for matrices with only zeros on the diagonal.
2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd
Simplify handling of tests that must fail to compile.
...
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
37c91e1836
bug #1644 : fix warning
2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75
Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
...
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd
Properly set the number of registers for AVX512
2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70
bug #1643 : fix compilation issue with gcc and no optimizaion
2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838
enable spilling workaround on architectures with SSE/AVX
2018-12-10 23:22:44 +01:00
Gael Guennebaud
cf697272e1
Remove debug code.
2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b
Various fixes in polynomial solver and its unit tests:
...
- cleanup noise in imaginary part of real roots
- take into account the magnitude of the derivative to check roots.
- use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Gael Guennebaud
348bb386d1
Enable "old" CMP0026 policy (not perfect, but better than dozens of warning)
2018-12-08 18:59:51 +01:00
Gael Guennebaud
bff90bf270
workaround "may be used uninitialized" warning
2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae
bug #1641 : fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512
2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529
fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target
2018-12-08 09:44:21 +01:00
Gael Guennebaud
cd25b538ab
Fix noise in sparse_basic_3 (numerical cancellation)
2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96
Fix noise in lu unit test
2018-12-08 00:05:03 +01:00
Gael Guennebaud
956678a4ef
bug #1515 : disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling.
2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6
Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error.
2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d
bug #1637 : workaround register spilling in gebp with clang>=6.0+AVX+FMA
2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b
bug #1638 : add a warning if avx512 is enabled without SSE/AVX FMA
2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22
bug #1636 : fix gemm performance issue with gcc>=6 and no FMA
2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0
AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only
2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5
Fix compilation with avx512f only, i.e., no AVX512DQ
2018-12-06 18:11:07 +01:00
Gael Guennebaud
aab749b1c3
fix test regarding AVX512 vectorization of complexes.
2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0
Implement AVX512 vectorization of std::complex<float/double>
2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59
temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though!
2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7
bug #1636 : fix compilation with some ABI versions.
2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2
#elif -> #else to fix GPU build.
2018-12-05 13:19:31 -08:00
Rasmus Munk Larsen
8a02883d58
Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
...
Fix tensor contraction on AVX512 builds
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-12-05 18:19:32 +00:00
Gael Guennebaud
acc3459a49
Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping.
2018-12-05 17:17:23 +01:00
Gael Guennebaud
e2e897298a
Fix page nesting
2018-12-05 17:13:46 +01:00
Christoph Hertzberg
c1d356e8b4
bug #1635 : Use infinity from Numtraits instead of creating it manually.
2018-12-05 15:01:04 +01:00
Mark D Ryan
36f8f6d0be
Fix evalShardedByInnerDim for AVX512 builds
...
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size. While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16. The result is slightly
incorrect float32 contractions on AVX512 builds.
This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Rasmus Munk Larsen
b57b31cce9
Merged in ezhulenev/eigen-01 (pull request PR-553)
...
Do not disable alignment with EIGEN_GPUCC
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6
Update checks in ConfigureVectorization.h
2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5
Do not disable alignment with EIGEN_GPUCC
2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe
bug #785 : Make Cholesky decomposition work for empty matrices
2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213
Add missing padd for Packet8i (it was implicitly generated by clang and gcc)
2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff
bug #1634 : remove double copy in move-ctor of non movable Matrix/Array
2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430
Add packet sin and cos to Altivec/VSX and NEON
2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be
Several improvements regarding packet-bitwise operations:
...
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876
Add psin/pcos on AVX512 -> almost for free, at last!
2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a
Cleanup
2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303
Fix pandnot order in AVX512
2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6
Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX)
2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d
Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks)
2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7
same for pmax
2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6
pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions
2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b
Add missing SSE/AVX type-casting in AVX512 mode
2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375
bug #1630 : fix linspaced when requesting smaller packet size than default one.
2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35
Use explicit packet type in SSE/PacketMath pldexp
2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08
do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first).
2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24
bug #1631 : fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions.
2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21
Update pshiftleft to pass the shift as a true compile-time integer.
2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda
Unify SSE/AVX psin functions.
...
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen
08edbc8cfe
Merged in bjacob/eigen/fixbuild (pull request PR-549)
...
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 20:14:12 +00:00
Benoit Jacob
7b1cb8a440
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008
Unify Altivec/VSX pexp(double) with default implementation
2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e
cleanup
2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10
Unify SSE and AVX pexp for double.
2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054
Unify NEON's pexp with generic implementation
2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc
Unify Altivec/VSX's pexp with generic implementation
2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5
Unify SSE and AVX implementation of pexp
2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47
Unify Altivec/VSX's plog with generic implementation, and enable it!
2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8
Unify NEON's plog with generic implementation
2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114
First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
...
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c
Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B"
2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f
Extend unit test to recursively check half-packet types and non packet types
2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6
bug #1611 : fix plog(0) on NEON
2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4
Fix typos
2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0
merge
2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9
Fix reserved usage of double __ in macro names
2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d
check two ctors
2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b
Fix double = bool !
2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2
Fix several uninitialized member from ctor
2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf
Add default constructor to Bar to make test compile again with clang-3.8
2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844
Small typo found be Patrick Huber (pull request PR-547)
2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879
bug #1624 : improve matrix-matrix product on ARM 64, 20% speedup
2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838
Move regression test to right unit test file
2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8
Workaround weird MSVC bug
2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57
Fixed most conversion warnings in MatrixFunctions module
2018-11-20 16:23:28 +01:00
Deven Desai
e7e6809e6b
ROCm/HIP specfic fixes + updates
...
1. Eigen/src/Core/arch/GPU/Half.h
Updating the HIPCC implementation half so that it can declared as a __shared__ variable
2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h
introducing a EIGEN_USE_STD(func) macro that calls
- std::func be default
- ::func when eigen is being compiled with HIPCC
This change was requested in the previous HIP PR
(https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff )
3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h
Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC
4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c
Make MaxPacketSize a true upper bound, even for fixed-size inputs
2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1
Add explicit regression test for bug #1622
2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c
PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
...
Commit aa110e681b
optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation. However, it
introduced a mismatch between the packet size and the requestedAlignment. This
mismatch can lead to crashes when the destination is not aligned. This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize
This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment
A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046
Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES
2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f
typo
2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a
help doxygen linking to DenseBase::NulllaryExpr
2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28
Improve doc on multi-threading and warn about hyper-threading
2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec
doxygen does not like \addtogroup and \ingroup in the same line
2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a
Merged in rmlarsen/eigen2 (pull request PR-543)
...
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
Approved-by: Eugene Zhulenev <ezhulenev@google.com >
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626
Remove accidental changes.
2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.
2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
77b447c24e
Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX.
2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc
Add manual doc on STL-compatible iterators
2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915
Fix warning in c++03
2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e
A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles.
2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf
bug #1619 : fix mixing of const and non-const generic iterators
2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba
bug #1619 : make const and non-const iterators compatible
2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025
add missing ref to a.zeta(b)
2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de
Limit the size of the toc
2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95
Update doxy hacks wrt doxygen 1.8.13/14
2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f
Let doxygen sees lastN
2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c
Add and update manual pages for slicing, indexing, and reshaping.
2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473
Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE
2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6
Fix max-size in indexed-view
2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff
Merged in glchaves/eigen (pull request PR-539)
...
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c
Add option to disable plot generation
2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a
Vectorize row-by-row gebp loop iterations on 16 packets as well
...
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com >
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com >
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6
add unit tests for bug #1619
2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e
bug #1617 : Fix SolveTriangular.solveInPlace crashing for empty matrix.
...
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d
bug #1618 : Use different power-of-2 check to avoid MSVC warning
2018-11-01 13:23:19 +01:00
Rasmus Munk Larsen
07fcdd1438
Merged in ezhulenev/eigen-02 (pull request PR-534)
...
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 11:31:29 -07:00
Halie Murray-Davis
fb62d6d96e
Fix typo in tutorial documentation.
2018-10-25 04:55:34 +00:00
Christoph Hertzberg
b5f077d22c
Document EIGEN_NO_IO preprocessor directive
2018-10-25 16:49:25 +02:00
Christian von Schultz
4a40b3785d
Collapsed revision (based on pull request PR-325)
...
* Support compiling without IO streams
Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f
Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
...
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code
4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
954b4ca9d0
Suppress compiler warning about unused global variable.
2018-10-22 13:48:56 -07:00
Rasmus Munk Larsen
9caafca550
Merged in rmlarsen/eigen (pull request PR-532)
...
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672
Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
...
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
39fec15d5c
Merged eigen/eigen into default
2018-10-19 09:48:19 -07:00
Christoph Hertzberg
40fa6f98bf
bug #1606 : Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
...
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
d8f285852b
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-18 16:55:02 -07:00
Rasmus Munk Larsen
dda68f56ec
Fix GPU build due to gpu_assert not always being defined.
2018-10-18 16:29:29 -07:00
Gael Guennebaud
1dcf5a6ed8
fix typo in doc
2018-10-17 09:29:36 +02:00
Eugene Zhulenev
9e96e91936
Move from rvalue arguments in ThreadPool enqueue* methods
2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816
Reduce thread scheduling overhead in parallelFor
2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f
Merged in ezhulenev/eigen-02 (pull request PR-528)
...
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-10-16 15:39:40 +00:00
Gael Guennebaud
0f780bb0b4
Fix float-to-double warning
2018-10-16 09:19:45 +02:00
Eugene Zhulenev
900c7c61bb
Check if it's allowed to squueze inner dimensions in TensorBlockIO
2018-10-15 16:52:33 -07:00
Gael Guennebaud
a39e0f7438
bug #1612 : fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>)
2018-10-16 01:04:25 +02:00
Gael Guennebaud
e3b85771d7
Show call stack in case of failing sparse solving.
2018-10-16 00:43:44 +02:00
Gael Guennebaud
d2d570c116
Remove useless (and broken) resize
2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d
Iterative solvers: unify and fix handling of multiple rhs.
...
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc
DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve
2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53
relax number of iterations checks to avoid false negatives
2018-10-15 10:23:32 +02:00
Gael Guennebaud
3a33db4de5
merge
2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1
Suppress unused variable compiler warning in sparse subtest 3.
2018-10-12 13:41:57 -07:00
Mark D Ryan
aa110e681b
PR 526: Speed up multiplication of small, dynamically sized matrices
...
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop. Using these types is likely to lead to little or no
vectorization. Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.
This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions. It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.
I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds. Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.
Disabling Packed8d on AVX512 builds:
Total Cases: 969
Better: 511
Worse: 85
Same: 373
Max Improvement: 169.00% (4 8 6)
Max Degradation: 36.50% (8 5 3)
Median Improvement: 35.46%
Median Degradation: 17.41%
Total FLOPs Improvement: 19.42%
Disabling Packet16f and Packed8f on AVX512 builds:
Total Cases: 969
Better: 658
Worse: 5
Same: 306
Max Improvement: 214.05% (8 6 5)
Max Degradation: 22.26% (16 2 1)
Median Improvement: 60.05%
Median Degradation: 13.32%
Total FLOPs Improvement: 59.58%
Disabling Packed8f on AVX builds:
Total Cases: 969
Better: 663
Worse: 96
Same: 210
Max Improvement: 155.29% (4 10 5)
Max Degradation: 35.12% (8 3 2)
Median Improvement: 34.28%
Median Degradation: 15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55
Fix code format
2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a
Workaround nbcc+msvc compiler bug
2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519
Explicitly convert 0 to Scalar for custom types
2018-10-12 10:22:19 +02:00
Gael Guennebaud
8214cf1896
Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways
2018-10-11 10:27:23 +02:00
Gael Guennebaud
43633fbaba
Fix warning with AVX512f
2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9
Fix avx512 plog(NaN) to return NaN instead of +inf
2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5
Enable avx512 plog with clang
2018-10-11 10:12:21 +02:00
Gael Guennebaud
2ef1b39674
Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
...
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea
relax numerical tests from equal to approx (x87)
2018-10-11 09:29:56 +02:00
Gael Guennebaud
f0aa7e40fc
Fix regression in changeset 5335659c47
2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b
bug #520 : add diagmat +/- diagmat operators.
2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47
Merged in ezhulenev/eigen-02 (pull request PR-525)
...
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688
bug #632 : add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
...
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1
bug #1609 : add inplace transposition unit test
2018-10-10 21:48:58 +02:00
Eugene Zhulenev
2bf1a31d81
Use void type if stl-style iterators are not supported
2018-10-10 10:31:40 -07:00
Christoph Hertzberg
f3130ee1ba
Avoid empty macro arguments
2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1
Merged in ezhulenev/eigen-01 (pull request PR-523)
...
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d
Hide stl-container detection test under #if
2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3
Compile time detection for unimplemented stl-style iterators
2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454
bug #65 : add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean()
2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50
Make redux_vec_unroller more flexible regarding packet-type
2018-10-09 23:30:41 +02:00
Gael Guennebaud
c0c3be26ed
Extend unit tests for partial reductions
2018-10-09 22:54:54 +02:00
Christoph Hertzberg
3f2c8b7ff0
Fix a lot of Doxygen warnings in Tensor module
2018-10-09 20:22:47 +02:00
Christoph Hertzberg
f6359ad795
Small Doxygen fixes
2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab
Fix compilation on CUDA
2018-10-09 17:02:16 +02:00
Gael Guennebaud
93a6192e98
fix mpreal for mpfr<4.0.0
2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen
d16634c4d4
Fix out-of bounds access in TensorArgMax.h.
2018-10-08 16:41:36 -07:00
Rasmus Munk Larsen
1a737e1d6a
Fix contraction test.
2018-10-08 16:37:07 -07:00
Gael Guennebaud
e00487f7d2
bug #1603 : add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy.
2018-10-08 22:27:04 +02:00
Gael Guennebaud
2eda9783de
typo
2018-10-08 21:37:46 +02:00
Gael Guennebaud
c6e2dde714
fix c++11 deprecated warning
2018-10-08 18:26:05 +02:00
Gael Guennebaud
6cc9b2c831
fix warning in mpreal.h
2018-10-08 18:25:37 +02:00
Gael Guennebaud
649d4758a6
merge
2018-10-08 17:35:18 +02:00
Gael Guennebaud
aa5820056e
Unify c++11 usage in doc's examples and snippets
2018-10-08 17:32:54 +02:00
Gael Guennebaud
e29bfe8479
Update included mpreal header to 3.6.5 and fix deprecated warnings.
2018-10-08 17:09:23 +02:00
Gael Guennebaud
64b1a15318
Workaround stupid warning
2018-10-08 12:01:18 +02:00
Gael Guennebaud
c9643f4a6f
Disable C++11 deprecated warning when limiting Eigen to C++98
2018-10-08 10:43:43 +02:00
Gael Guennebaud
774bb9d6f7
fix a doxygen issue
2018-10-08 09:30:15 +02:00
Gael Guennebaud
6c3f6cd52b
Fix maybe-uninitialized warning
2018-10-07 23:29:51 +02:00
Gael Guennebaud
bcb7c66b53
Workaround gcc's alloc-size-larger-than= warning
2018-10-07 21:55:59 +02:00
Gael Guennebaud
16b2001ece
Fix gcc 8.1 warning: "maybe use uninitialized"
2018-10-07 21:54:49 +02:00
Gael Guennebaud
6512c5e136
Implement a better workaround for GCC's bug #87544
2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81
Workaround gcc bug making it trigger an invalid warning
2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036
Workaround MSVC compilation issue
2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5
Clarify doc of rowwise/colwise/vectorwise.
2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7
Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns
2018-10-05 23:11:21 +02:00
Gael Guennebaud
91613bf2c2
Add support for c++11 snippets
2018-10-05 23:08:39 +02:00
Gael Guennebaud
3e64b1fc86
Move iterators to internal, improve doc, make unit test c++03 friendly
2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580
fix unused warning
2018-10-03 14:16:21 +02:00
Gael Guennebaud
8a1e98240e
add unit tests
2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598
Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
...
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25
Add pointer-based iterator for direct-access expressions
2018-10-02 23:44:36 +02:00
Christoph Hertzberg
c5f1d0a72a
Fix shadow warning
2018-10-02 19:01:08 +02:00
Christoph Hertzberg
b92c71235d
Move struct outside of method for C++03 compatibility.
2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff
Make code compile in C++03 mode again
2018-10-02 18:36:30 +02:00
Christoph Hertzberg
b786ce8c72
Fix conversion warning ... again
2018-10-02 18:35:25 +02:00
Gael Guennebaud
8c38528168
Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index)
2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce
Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns)
2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893
Use Index instead of ptrdiff_t or int, fix random-accessors.
2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c
bug #1605 : workaround ABI issue with vector types (aka __m128) versus scalar types (aka float)
2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1
bug #231 : initial implementation of STL iterators for dense expressions
2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39
Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
...
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6
This commit contains the following (HIP specific) updates:
...
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
Changing "pass-by-reference" argument to be "pass-by-value" instead
(in a __global__ function decl).
"pass-by-reference" arguments to __global__ functions are unwise,
and will be explicitly flagged as errors by the newer versions of HIP.
- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
Changes introduced in recent commits breaks the HIP compile.
Adding EIGEN_DEVICE_FUNC attribute to some functions and
calling ::malloc/free instead of the corresponding std:: versions
to get the HIP compile working again
- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
Change introduced a recent commit breaks the HIP compile
(link stage errors out due to failure to inline a function).
Disabling the recently introduced code (only for HIP compile), to get
the eigen nightly testing going again.
Will submit another PR once we have te proper fix.
- Eigen/src/Core/util/ConfigureVectorization.h
Enabling GPU VECTOR support when HIP compiler is in use
(for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen
2088c0897f
Merged eigen/eigen into default
2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen
31629bb964
Get rid of unused variable warning.
2018-09-28 16:00:09 -07:00
Eugene Zhulenev
bb13d5d917
Fix bug in copy optimization in Tensor slicing.
2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen
104e8fa074
Fix a few warnings and rename a variable to not shadow "last".
2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen
7c1b47840a
Merged in ezhulenev/eigen-01 (pull request PR-514)
...
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 18:37:54 +00:00
Eugene Zhulenev
524c81f3fa
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 11:24:08 -07:00
Christoph Hertzberg
86ba50be39
Fix integer conversion warnings
2018-09-28 19:33:39 +02:00
Eugene Zhulenev
e95696acb3
Optimize TensorBlockCopyOp
2018-09-27 14:49:26 -07:00
Eugene Zhulenev
9f33e71e9d
Revert code lost in merge
2018-09-27 12:08:17 -07:00
Eugene Zhulenev
a7a3e9f2b6
Merge with eigen/eigen default
2018-09-27 12:05:06 -07:00
Eugene Zhulenev
9f4988959f
Remove explicit mkldnn support and redundant TensorContractionKernelBlocking
2018-09-27 11:49:19 -07:00
Rasmus Munk Larsen
1e5750a5b8
Merged in rmlarsen/eigen4 (pull request PR-511)
...
Parallelize tensor contraction over the inner dimension.
2018-09-27 17:18:32 +00:00
Gael Guennebaud
af3ad4b513
oops, I've been too fast in previous copy/paste
2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877
#pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6
2018-09-27 09:23:54 +02:00
Eugene Zhulenev
b314376f9c
Test mkldnn pack for doubles
2018-09-26 18:22:24 -07:00
Eugene Zhulenev
22ed98a331
Conditionally add mkldnn test
2018-09-26 17:57:37 -07:00
Rasmus Munk Larsen
d956204ab2
Remove "false &&" left over from test.
2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen
3815aeed7a
Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
...
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1 387457 396013 -2.2%
BM_Matmul_1_80_13522_2 406487 230789 +43.2%
BM_Matmul_1_80_13522_4 395821 123211 +68.9%
BM_Matmul_1_80_13522_6 391625 97002 +75.2%
BM_Matmul_1_80_13522_8 408986 113828 +72.2%
BM_Matmul_1_80_13522_16 399988 67600 +83.1%
BM_Matmul_1_80_13522_22 411546 60044 +85.4%
BM_Matmul_1_80_13522_32 393528 57312 +85.4%
BM_Matmul_1_80_13522_44 390047 63525 +83.7%
BM_Matmul_1_80_13522_88 387876 63592 +83.6%
BM_Matmul_1_1500_500_1 245359 248119 -1.1%
BM_Matmul_1_1500_500_2 401833 143271 +64.3%
BM_Matmul_1_1500_500_4 210519 100231 +52.4%
BM_Matmul_1_1500_500_6 251582 86575 +65.6%
BM_Matmul_1_1500_500_8 211499 80444 +62.0%
BM_Matmul_3_250_512_1 70297 68551 +2.5%
BM_Matmul_3_250_512_2 70141 52450 +25.2%
BM_Matmul_3_250_512_4 67872 58204 +14.2%
BM_Matmul_3_250_512_6 71378 63340 +11.3%
BM_Matmul_3_250_512_8 69595 41652 +40.2%
BM_Matmul_3_250_512_16 72055 42549 +40.9%
BM_Matmul_3_250_512_22 70158 54023 +23.0%
BM_Matmul_3_250_512_32 71541 56042 +21.7%
BM_Matmul_3_250_512_44 71843 57019 +20.6%
BM_Matmul_3_250_512_88 69951 54045 +22.7%
BM_Matmul_3_1500_512_1 369328 374284 -1.4%
BM_Matmul_3_1500_512_2 428656 223603 +47.8%
BM_Matmul_3_1500_512_4 205599 139508 +32.1%
BM_Matmul_3_1500_512_6 214278 139071 +35.1%
BM_Matmul_3_1500_512_8 184149 142338 +22.7%
BM_Matmul_3_1500_512_16 156462 156983 -0.3%
BM_Matmul_3_1500_512_22 163905 158259 +3.4%
BM_Matmul_3_1500_512_32 155314 157662 -1.5%
BM_Matmul_3_1500_512_44 235434 158657 +32.6%
BM_Matmul_3_1500_512_88 156779 160275 -2.2%
BM_Matmul_1500_4_512_1 363358 349528 +3.8%
BM_Matmul_1500_4_512_2 303134 263319 +13.1%
BM_Matmul_1500_4_512_4 176208 130086 +26.2%
BM_Matmul_1500_4_512_6 148026 115449 +22.0%
BM_Matmul_1500_4_512_8 131656 98421 +25.2%
BM_Matmul_1500_4_512_16 134011 82861 +38.2%
BM_Matmul_1500_4_512_22 134950 85685 +36.5%
BM_Matmul_1500_4_512_32 133165 90081 +32.4%
BM_Matmul_1500_4_512_44 133203 90644 +32.0%
BM_Matmul_1500_4_512_88 134106 100566 +25.0%
BM_Matmul_4_1500_512_1 439243 435058 +1.0%
BM_Matmul_4_1500_512_2 451830 257032 +43.1%
BM_Matmul_4_1500_512_4 276434 164513 +40.5%
BM_Matmul_4_1500_512_6 182542 144827 +20.7%
BM_Matmul_4_1500_512_8 179411 166256 +7.3%
BM_Matmul_4_1500_512_16 158101 155560 +1.6%
BM_Matmul_4_1500_512_22 152435 155448 -1.9%
BM_Matmul_4_1500_512_32 155150 149538 +3.6%
BM_Matmul_4_1500_512_44 193842 149777 +22.7%
BM_Matmul_4_1500_512_88 149544 154468 -3.3%
2018-09-26 16:47:13 -07:00
Eugene Zhulenev
71cd3fbd6a
Support multiple contraction kernel types in TensorContractionThreadPool
2018-09-26 11:08:47 -07:00
Christoph Hertzberg
0a3356f4ec
Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang)
2018-09-25 20:26:16 +02:00
Gael Guennebaud
41c3a2ffc1
Fix documentation of reshape to vectors.
2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e
Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides
2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd
fix alignment issue in ploaddup for AVX512
2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36
Merge with default.
2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b
Fix logic in diagonal*dense product in a corner case.
...
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996
Demangle Travseral and Unrolling in Redux
2018-09-21 23:03:45 +02:00
Gael Guennebaud
c696dbcaa6
Fiw shadowing of last and all
2018-09-21 23:02:33 +02:00
Christoph Hertzberg
e3c8289047
Replace unused PREDICATE by corresponding STATIC_ASSERT
2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae
Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all)
2018-09-21 16:50:04 +02:00
Gael Guennebaud
4291f167ee
Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3
2018-09-21 14:53:43 +02:00
Gael Guennebaud
03a0cb2b72
fix unalignedcount for avx512
2018-09-21 14:40:26 +02:00
Gael Guennebaud
371068992a
Add more debug output
2018-09-21 14:32:39 +02:00
Gael Guennebaud
91716f03a7
Fix vectorization logic unit test for AVX512
2018-09-21 14:32:24 +02:00
Gael Guennebaud
b00e48a867
Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks)
2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787
merge with default Eigen
2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970
Doc fixes
2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914
Merged indexing cleanup (pull request PR-506)
2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866
Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
...
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Eugene Zhulenev
719e438a20
Collapsed revision
...
* Split cxx11_tensor_executor test
* Register test parts with EIGEN_SUFFIXES
* Fix EIGEN_SUFFIXES in cxx11_tensor_executor test
2018-09-20 15:19:12 -07:00
Gael Guennebaud
f0ef3467de
Fix doc
2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117
Add indexing namespace
2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e
Fix shadowing
2018-09-20 22:56:21 +02:00
Rasmus Munk Larsen
8e2be7777e
Merged eigen/eigen into default
2018-09-20 11:41:15 -07:00
Rasmus Munk Larsen
5d2e759329
Initialize BlockIteratorState in a C++03 compatible way.
2018-09-20 11:40:43 -07:00
Gael Guennebaud
e04faca930
merge
2018-09-20 18:33:54 +02:00
Gael Guennebaud
d37188b9c1
Fix MPrealSupport
2018-09-20 18:30:10 +02:00
Gael Guennebaud
3c6dc93f99
Fix GPU support.
2018-09-20 18:29:21 +02:00
Gael Guennebaud
e0f6d352fb
Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header.
2018-09-20 18:07:32 +02:00
Gael Guennebaud
eeeb18814f
Fix warning
2018-09-20 17:48:56 +02:00
Gael Guennebaud
9419f506d0
Fix regression introduced by the previous fix for AVX512.
...
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Christoph Hertzberg
a0166ab651
Workaround for spurious "array subscript is above array bounds" warnings with g++4.x
2018-09-20 17:08:43 +02:00
Gael Guennebaud
e38d1ab4d1
Workaround increases required alignment warning
2018-09-20 17:07:33 +02:00
Christoph Hertzberg
c50250cb24
Avoid warning "suggest braces around initialization of subobject".
...
This test is not run in C++03 mode, so no compatibility is lost.
2018-09-20 17:03:42 +02:00
Gael Guennebaud
71496b0e25
Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
...
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e
Fix warnings in AVX512
2018-09-20 16:58:51 +02:00
Gael Guennebaud
2cf6d3050c
Disable ignoring attributes warning
2018-09-20 11:38:19 +02:00
Rasmus Munk Larsen
44d8274383
Cast to longer type.
2018-09-19 13:31:42 -07:00
Rasmus Munk Larsen
d638b62dda
Silence compiler warning.
2018-09-19 13:27:55 -07:00
Rasmus Munk Larsen
db9c9df59a
Silence more compiler warnings.
2018-09-19 11:50:27 -07:00
Rasmus Munk Larsen
febd09dcc0
Silence compiler warnings in ThreadPoolInterface.h.
2018-09-19 11:11:04 -07:00
Gael Guennebaud
c3a19527a2
Fix doc wrt previous change
2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d
Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
...
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
luz.paz"
f67b19a884
[PATCH 1/2] Misc. typos
...
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
CMakeLists.txt | 26 +++++++++----------
Eigen/src/Core/GenericPacketMath.h | 2 +-
Eigen/src/SparseLU/SparseLU.h | 2 +-
bench/bench_norm.cpp | 2 +-
doc/HiPerformance.dox | 2 +-
doc/QuickStartGuide.dox | 2 +-
.../Eigen/CXX11/src/Tensor/TensorChipping.h | 6 ++---
.../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h | 2 +-
.../src/Tensor/TensorForwardDeclarations.h | 4 +--
.../src/Tensor/TensorGpuHipCudaDefines.h | 2 +-
.../Eigen/CXX11/src/Tensor/TensorReduction.h | 2 +-
.../CXX11/src/Tensor/TensorReductionGpu.h | 2 +-
.../test/cxx11_tensor_concatenation.cpp | 2 +-
unsupported/test/cxx11_tensor_executor.cpp | 2 +-
14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Gael Guennebaud
297ca62319
ease transition by adding placeholders::all/last/and as deprecated
2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28
Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end.
2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d
Rename Symbolic namespace to symbolic to be consistent with numext namespace
2018-09-15 14:16:20 +02:00
Rasmus Munk Larsen
400512bfad
Merged in ezhulenev/eigen-02 (pull request PR-501)
...
Enable DSizes type promotion with c++03
2018-09-19 00:50:04 +00:00
Eugene Zhulenev
c4627039ac
Support static dimensions (aka IndexList) in Tensor::resize(...)
2018-09-18 14:25:21 -07:00
Gael Guennebaud
3e8188fc77
bug #1600 : initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert)
2018-09-18 21:24:48 +02:00
Eugene Zhulenev
218a7b9840
Enable DSizes type promotion with c++03 compilers
2018-09-18 10:57:00 -07:00
Ravi Kiran
1f0c941c3d
Collapsed revision
...
* Merged eigen/eigen into default
2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen
03a88c57e1
Merged in ezhulenev/eigen-02 (pull request PR-498)
...
Add DSizes index type promotion
2018-09-17 21:58:38 +00:00
Rasmus Munk Larsen
5ca0e4a245
Merged in ezhulenev/eigen-01 (pull request PR-497)
...
Fix warnings in IndexList array_prod
2018-09-17 20:15:06 +00:00
Eugene Zhulenev
a5cd4e9ad1
Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList
2018-09-17 10:58:07 -07:00
Gael Guennebaud
b311bfb752
bug #1596 : fix inclusion of Eigen's header within unsupported modules.
2018-09-17 09:54:29 +02:00
Gael Guennebaud
72f19c827a
typo
2018-09-16 22:10:34 +02:00
Eugene Zhulenev
66f056776f
Add DSizes index type promotion
2018-09-15 15:17:38 -07:00
Eugene Zhulenev
f313126dab
Fix warnings in IndexList array_prod
2018-09-15 13:47:54 -07:00
Christoph Hertzberg
42705ba574
Fix weird error for building with g++-4.7 in C++03 mode.
2018-09-15 12:43:41 +02:00
Rasmus Munk Larsen
c2383f95af
Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494)
...
Fix DSizes IndexList constructor
2018-09-15 02:36:19 +00:00
Rasmus Munk Larsen
30290cdd56
Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493)
...
Const cast scalar pointer in TensorSlicingOp evaluator
Approved-by: Sameer Agarwal <sameeragarwal@google.com >
2018-09-15 02:35:07 +00:00
Eugene Zhulenev
f7d0053cf0
Fix DSizes IndexList constructor
2018-09-14 19:19:13 -07:00
Rasmus Munk Larsen
601e289d27
Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492)
...
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-15 01:36:21 +00:00
Eugene Zhulenev
71070a1e84
Const cast scalar pointer in TensorSlicingOp evaluator
2018-09-14 17:17:50 -07:00
Eugene Zhulenev
4863375723
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-14 16:55:05 -07:00
Rasmus Munk Larsen
14e35855e1
Merged in chtz/eigen-maxsizevector (pull request PR-490)
...
Let MaxSizeVector respect alignment of objects
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-09-14 23:29:24 +00:00
Rasmus Munk Larsen
281e631839
Merged in ezhulenev/eigen/indexlist_to_dsize (pull request PR-491)
...
Support reshaping with static shapes and dimensions conversion in tensor broadcasting
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
2018-09-14 22:45:52 +00:00
Eugene Zhulenev
1b8d70a22b
Support reshaping with static shapes and dimensions conversion in tensor broadcasting
2018-09-14 15:25:27 -07:00
Christoph Hertzberg
007f165c69
bug #1598 : Let MaxSizeVector respect alignment of objects and add a unit test
...
Also revert 8b3d9ed081
2018-09-14 20:21:56 +02:00
Christoph Hertzberg
d7378aae8e
Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment.
2018-09-14 20:17:47 +02:00
Rasmus Munk Larsen
9b864cdb37
Merged in rmlarsen/eigen3 (pull request PR-480)
...
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-14 00:05:09 +00:00
Rasmus Munk Larsen
d0eef5fe6c
Don't use bracket syntax in ctor.
2018-09-13 17:04:05 -07:00
Rasmus Munk Larsen
6313dde390
Fix merge error.
2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen
0db590d22d
Backed out changeset 01197e4452
2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen
b3f4c067d9
Merge
2018-09-13 16:18:52 -07:00
Rasmus Munk Larsen
2b07018140
Enable vectorized version on GPUs. The underlying bug has been fixed.
2018-09-13 16:12:22 -07:00
Rasmus Munk Larsen
53568e3549
Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444)
...
Tiled evaluation for Tensor ops
Approved-by: Rasmus Munk Larsen <rmlarsen@google.com >
Approved-by: Gael Guennebaud <g.gael@free.fr >
2018-09-13 22:05:47 +00:00
Eugene Zhulenev
01197e4452
Fix warnings
2018-09-13 15:03:36 -07:00
Gael Guennebaud
1141bcf794
Fix conjugate-gradient for very small rhs
2018-09-13 23:53:28 +02:00
Gael Guennebaud
7f3b17e403
MSVC 2015 supports c++11 thread-local-storage
2018-09-13 18:15:07 +02:00
Eugene Zhulenev
d138fe341d
Fis static_assert in test to conform c++11 standard
2018-09-11 17:23:18 -07:00
Rasmus Munk Larsen
e289f44c56
Don't vectorize the MeanReducer unless pdiv is available.
2018-09-11 14:09:00 -07:00
Eugene Zhulenev
55bb7e7935
Merge with upstream eigen/default
2018-09-11 13:33:06 -07:00
Eugene Zhulenev
81b38a155a
Fix compilation of tiled evaluation code with c++03
2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen
5da960702f
Merged eigen/eigen into default
2018-09-11 10:08:46 -07:00
Rasmus Munk Larsen
46f88fc454
Use numerically stable tree reduction in TensorReduction.
2018-09-11 10:08:10 -07:00
Justin Carpentier
4827bec776
LLT: correct doc and add missing reference for the return type of rankUpdate
...
---
Eigen/src/Cholesky/LLT.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
Rasmus Munk Larsen
3d057e0453
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-06 12:59:36 -07:00
cgs1019
c6066ac411
Make param name and docs constistent for JacobiRotation::makeGivens
...
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Alexey Frunze
edeee16a16
Fix build failures in matrix_power and matrix_exponential tests.
...
This fixes the static assertion complaining about double being
used in place of long double. This happened on MIPS32, where
double and long double have the same type representation.
This can be simulated on x86 as well if we pass -mlong-double-64
to g++.
2018-08-31 14:11:10 -07:00
Deven Desai
c64fe9ea1f
Updates to fix HIP-clang specific compile errors.
...
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Rasmus Munk Larsen
8b3d9ed081
Use padding instead of alignment attribute, which MaxSizeVector does not respect. This leads to undefined behavior and hard-to-trace bugs.
2018-09-05 11:20:06 -07:00
Gael Guennebaud
5927eef612
Enable std::result_of for msvc 2015 and later
2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827
Fix misleading indentation of errorCode and make it loop-local
2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d
Disable type-limits warnings for g++ < 4.8
2018-09-12 14:40:39 +02:00
Christoph Hertzberg
ba2c8efdcf
EIGEN_UNUSED is not supported by g++4.7 (and not portable)
2018-09-12 11:49:10 +02:00
Christoph Hertzberg
ff4e835d6b
"sparse_product.cpp" must be included before "sparse_basic.cpp", otherwise EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN has no effect
2018-08-30 20:10:11 +02:00
Christoph Hertzberg
023ed6b9a8
Product of empty array must be 1 and not 0.
2018-08-30 17:14:52 +02:00
Christoph Hertzberg
c2f4e8c08e
Fix integer conversion warning
2018-08-30 17:12:53 +02:00
Christoph Hertzberg
ddbc564386
Fixed a few more shadowing warnings when compiling with g++ (and c++03)
2018-08-30 16:33:03 +02:00
Deven Desai
946c3e2544
adding EIGEN_DEVICE_FUNC attribute to fix some GPU unit tests that are broken in HIP mode
2018-08-27 23:04:08 +00:00
Mehdi Goli
7ec8b40ad9
Collapsed revision
...
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
20ba2eee6d
gcc thinks this may not be initialized
2018-08-28 18:33:24 +02:00
Christoph Hertzberg
73ca600bca
Fix numerous shadow-warnings for GCC<=4.8
2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8
Disable/ReenableStupidWarnings did not work properly, when included recursively
2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f
bug #1590 : fix collision with some system headers defining the macro FP32
2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8
Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
...
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00
Eugene Zhulenev
c144bb355b
Merge with upstream eigen/default
2018-08-27 14:34:07 -07:00
Gael Guennebaud
5747288676
Disable a bonus unit-test which is broken with gcc 4.7
2018-08-27 13:07:34 +02:00
Gael Guennebaud
d5ed64512f
bug #1573 : workaround gcc 4.7 and 4.8 bug
2018-08-27 10:38:20 +02:00
Christoph Hertzberg
b1653d1599
Fix some trivial C++11 vs C++03 compatibility warnings
2018-08-25 12:21:00 +02:00
Christoph Hertzberg
42123ff38b
Make unit test C++03 compatible
2018-08-25 11:53:28 +02:00
Christoph Hertzberg
4b1ad086b5
Fix shadow warnings in doc-snippets
2018-08-25 10:07:17 +02:00
Christoph Hertzberg
117bc5d505
Fix some shadow warnings
2018-08-25 09:06:08 +02:00
Christoph Hertzberg
f155e97adb
Previous fix broke compilation for clang
2018-08-25 00:10:46 +02:00
Christoph Hertzberg
209b4972ec
Fix conversion warning
2018-08-25 00:02:46 +02:00
Christoph Hertzberg
495f6c3c3a
Fix missing-braces warnings
2018-08-24 23:56:13 +02:00
Christoph Hertzberg
5aaedbeced
Fixed more sign-compare and type-limits warnings
2018-08-24 23:54:12 +02:00
Christoph Hertzberg
8295f02b36
Hide "maybe uninitialized" warning on gcc
2018-08-24 23:22:20 +02:00
Christoph Hertzberg
f7675b826b
Fix several integer conversion and sign-compare warnings
2018-08-24 22:58:55 +02:00
Christoph Hertzberg
949b0ad9cb
Merged in rmlarsen/eigen3 (pull request PR-468)
...
Add support for emulating thread local.
2018-08-24 17:29:03 +00:00
Rasmus Munk Larsen
744e2fe0de
Address comments about EIGEN_THREAD_LOCAL.
2018-08-24 10:24:54 -07:00
Christoph Hertzberg
ad4a08fb68
Use Intel cast intrinsics, since MSVC does not allow direct casting.
...
Reported by David Winkler.
2018-08-24 19:04:33 +02:00
Rasmus Munk Larsen
8d9bc5cc02
Fix g++ compilation.
2018-08-23 13:06:39 -07:00
Rasmus Munk Larsen
e9f9d70611
Don't rely on __had_feature for g++.
...
Don't use __thread.
Only use thread_local for gcc 4.8 or newer.
2018-08-23 12:59:46 -07:00
Rasmus Munk Larsen
668690978f
Pad PerThread when we emulate thread_local to prevent false sharing.
2018-08-23 12:54:33 -07:00
Rasmus Munk Larsen
6cedc5a9b3
rename mu.
2018-08-23 12:11:58 -07:00
Rasmus Munk Larsen
6e0464004a
Store std::unique_ptr instead of raw pointers in per_thread_map_.
2018-08-23 12:10:08 -07:00
Rasmus Munk Larsen
e51d9e473a
Protect #undef max with #ifdef max.
2018-08-23 11:42:05 -07:00
Rasmus Munk Larsen
d35880ed91
merge
2018-08-23 11:36:49 -07:00
Christoph Hertzberg
a709c8efb4
Replace pointers by values or unique_ptr for better leak-safety
2018-08-23 19:41:59 +02:00
Christoph Hertzberg
39335cf51e
Make MaxSizeVector leak-safe
2018-08-23 19:37:56 +02:00
Benoit Steiner
ff8e0ecc2f
Updated one more line of code to avoid making the test dependent on cxx11 features.
2018-08-17 15:15:52 -07:00
Benoit Steiner
43d9dd9b28
Removed more dependencies on cxx11.
2018-08-17 08:49:32 -07:00
Gael Guennebaud
f76c802973
Add missing empty line
2018-08-17 17:16:12 +02:00
Christoph Hertzberg
41f1cc67b8
Assertion depended on a not yet initialized value
2018-08-17 16:42:53 +02:00
Christoph Hertzberg
4713465eef
Silence double-promotion warning
2018-08-17 16:39:43 +02:00
Christoph Hertzberg
595cae9b09
Silence logical-op-parentheses warning
2018-08-17 16:30:32 +02:00
Christoph Hertzberg
c9b25fbefa
Silence unused parameter warning
2018-08-17 16:28:28 +02:00
Christoph Hertzberg
dbdeceabdd
Silence double-promotion warning (when converting double to complex<long double>)
2018-08-17 16:26:11 +02:00
Benoit Steiner
19df4d5752
Merged in codeplaysoftware/eigen-upstream-pure/Pointer_type_creation (pull request PR-461)
...
Creating a pointer type in TensorCustomOp.h
2018-08-16 18:28:33 +00:00
Benoit Steiner
f641cf1253
Adding missing at method in Eigen::array
2018-08-16 11:24:37 -07:00
Benoit Steiner
ede580ccda
Avoid using the auto keyword to make the tensor block access test more portable
2018-08-16 10:49:47 -07:00
Benoit Steiner
e23c8c294e
Use actual types instead of the auto keyword to make the code more portable
2018-08-16 10:41:01 -07:00
Mehdi Goli
80f1a76dec
removing the noises.
2018-08-16 13:33:24 +01:00
Mehdi Goli
d0b01ebbf6
Reverting the unitended delete from the code.
2018-08-16 13:21:36 +01:00
Mehdi Goli
161dcbae9b
Using PointerType struct and specializing it per device for TensorCustomOp.h
2018-08-16 00:07:02 +01:00
Sameer Agarwal
f197c3f55b
Removed an used variable (PacketSize) from TensorExecutor
2018-08-15 11:24:57 -07:00
Benoit Steiner
4181556907
Fixed the tensor contraction code.
2018-08-15 09:34:47 -07:00
Benoit Steiner
b6f96cf7dd
Removed dependencies on cxx11 language features from the tensor_block_access test
2018-08-15 08:54:31 -07:00
Benoit Steiner
fbb834144d
Fixed more compilation errors
2018-08-15 08:52:58 -07:00
Benoit Steiner
6bb3f1b43e
Made the tensor_block_access test compile again
2018-08-14 14:26:59 -07:00
Benoit Steiner
43ec0082a6
Made the kronecker_product test compile again
2018-08-14 14:08:36 -07:00
Benoit Steiner
ab3f481141
Cleaned up the code and make it compile with more compilers
2018-08-14 14:05:46 -07:00
Rasmus Munk Larsen
fa0bcbf230
merge
2018-08-14 12:18:31 -07:00
Rasmus Munk Larsen
15d4f515e2
Use plain_assert in destructors to avoid throwing in CXX11 tests where main.h owerwrites eigen_assert with a throwing version.
2018-08-14 12:17:46 -07:00
Rasmus Munk Larsen
aebdb06424
Fix a few compiler warnings in CXX11 tests.
2018-08-14 12:06:39 -07:00
Rasmus Munk Larsen
2a98bd9c8e
Merged eigen/eigen into default
2018-08-14 12:02:09 -07:00
Benoit Steiner
59bba77ead
Fixed compilation errors with gcc 4.7 and 4.8
2018-08-14 10:54:48 -07:00
Mehdi Goli
a97aaa2bcf
Merge with upstream.
2018-08-14 17:49:29 +01:00
Mehdi Goli
8ba799805b
Merge with upstream
2018-08-14 09:43:45 +01:00
Rasmus Munk Larsen
6d6e7b7027
merge
2018-08-13 15:34:50 -07:00
Rasmus Munk Larsen
9bb75d8d31
Add Barrier.h.
2018-08-13 15:34:03 -07:00
Rasmus Munk Larsen
2e1adc0324
Merged eigen/eigen into default
2018-08-13 15:32:00 -07:00
Rasmus Munk Larsen
8278ae6313
Add support for thread local support on platforms that do not support it through emulation using a hash map.
2018-08-13 15:31:23 -07:00
Benoit Steiner
501be70b27
Code cleanup
2018-08-13 15:16:40 -07:00
Benoit Steiner
3d3711f22f
Fixed compilation errors.
2018-08-13 15:16:06 -07:00
Gael Guennebaud
3ec60215df
Merged in rmlarsen/eigen2 (pull request PR-466)
...
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
0f1b2e08a5
Call logistic functor from Tensor::sigmoid.
2018-08-13 11:52:58 -07:00
Rasmus Munk Larsen
d6e283ba96
sigmoid -> logistic
2018-08-13 11:14:50 -07:00
Benoit Steiner
26239ee580
Use NULL instead of nullptr to avoid adding a cxx11 requirement.
2018-08-13 11:05:51 -07:00
Benoit Steiner
3810ec228f
Don't use the auto keyword since it's not always supported properly.
2018-08-13 10:46:09 -07:00
Benoit Steiner
e6d5be811d
Fixed syntax of nested templates chevrons to make it compatible with c++97 mode.
2018-08-13 10:29:21 -07:00
Mehdi Goli
1aa86aad14
Merge with upstream.
2018-08-13 15:40:31 +01:00
Eugene Zhulenev
35d90e8960
Fix BlockAccess enum in CwiseUnaryOp evaluator
2018-08-10 17:37:58 -07:00
Eugene Zhulenev
855b68896b
Merge with eigen/default
2018-08-10 17:18:42 -07:00
Eugene Zhulenev
f2209d06e4
Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators
2018-08-10 16:53:36 -07:00
Benoit Steiner
c8ea398675
Avoided language features that are only available in cxx11 mode.
2018-08-10 13:02:41 -07:00
Benoit Steiner
4be4286224
Made the code compile with gcc 5.4.
2018-08-10 11:32:58 -07:00
Justin Carpentier
eabc7a4031
PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type
...
The type should be RowMajor
2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen
c49e93440f
SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is
...
to use SuiteSparse_long as the type instead of long.
2018-08-13 15:53:31 -07:00
Mehdi Goli
3a2e1b1fc6
Merge with upstream.
2018-08-10 12:28:38 +01:00
Rasmus Munk Larsen
bfc5091dd5
Cast to diagonalSize to RealScalar instead Scalar.
2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen
8603d80029
Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index.
2018-08-09 11:09:10 -07:00
Eugene Zhulenev
cfaedb38cd
Fix bug in a test + compilation errors
2018-08-09 09:44:07 -07:00
Mehdi Goli
ea8fa5e86f
Merge with upstream
2018-08-09 14:07:56 +01:00
Mehdi Goli
8c083bfd0e
Properly fixing the PointerType for TensorCustomOp.h. As the output type here should be based on CoeffreturnType not the Scalar type. Therefore, Similar to reduction and evalTo function, it should have its own MakePointer class. In this case, for other device the type is defaulted to CoeffReturnType and no changes is required on users' code. However, in SYCL, on the device, we can recunstruct the device Type.
2018-08-09 13:57:43 +01:00
Alexey Frunze
050bcf6126
bug #1584 : Improve random (avoid undefined behavior).
2018-08-08 20:19:32 -07:00
Eugene Zhulenev
1c8b9e10a7
Merged with upstream eigen
2018-08-08 16:57:58 -07:00
Benoit Steiner
131ed1191f
Merged in codeplaysoftware/eigen-upstream-pure/Fixing_compiler_warning (pull request PR-462)
...
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 18:14:15 +00:00
Benoit Steiner
1285c080b3
Merged in codeplaysoftware/eigen-upstream-pure/disabling_assert_in_sycl (pull request PR-459)
...
Disabling assert inside SYCL kernel.
2018-08-08 18:12:42 +00:00
Benoit Steiner
c4b2845be9
Merged in rmlarsen/eigen3 (pull request PR-458)
...
Fix init order.
2018-08-08 18:11:49 +00:00
Benoit Steiner
7124172b83
Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_UNROLL_LOOP (pull request PR-460)
...
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 18:10:54 +00:00
Mehdi Goli
532a0be05c
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 12:12:26 +01:00
Mehdi Goli
67711eaa31
Fixing typo.
2018-08-08 11:38:10 +01:00
Mehdi Goli
3055e3a7c2
Creating a pointer type in TensorCustomOp.h
2018-08-08 11:19:02 +01:00
Mehdi Goli
22031ab59a
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 11:07:27 +01:00
Mehdi Goli
908b906d79
Disabling assert inside SYCL kernel.
2018-08-08 10:01:10 +01:00
Rasmus Munk Larsen
693fb1d41e
Fix init order.
2018-08-07 17:18:51 -07:00
Benoit Steiner
10d286f55b
Silenced a couple of compilation warnings.
2018-08-06 16:00:29 -07:00
Benoit Steiner
d011d05fd6
Fixed compilation errors.
2018-08-06 13:40:51 -07:00
Rasmus Munk Larsen
36e7e7dd8f
Forward declare NoOpOutputKernel as struct rather than class to be consistent with implementation.
2018-08-06 13:16:32 -07:00
Rasmus Munk Larsen
fa68342ef8
Move sigmoid functor to core.
2018-08-03 17:31:23 -07:00
Gael Guennebaud
09c81ac033
bug #1451 : fix numeric_limits<AutoDiffScalar<Der>> with a reference as derivative type
2018-08-04 00:17:37 +02:00
luz.paz"
43fd42a33b
Fix doxy and misc. typos
...
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
Eigen/src/Core/ProductEvaluators.h | 4 ++--
Eigen/src/Core/arch/GPU/Half.h | 2 +-
Eigen/src/Core/util/Memory.h | 2 +-
Eigen/src/Geometry/Hyperplane.h | 2 +-
Eigen/src/Geometry/Transform.h | 2 +-
Eigen/src/Geometry/Translation.h | 12 ++++++------
doc/PreprocessorDirectives.dox | 2 +-
doc/TutorialGeometry.dox | 2 +-
test/boostmultiprec.cpp | 2 +-
test/triangular.cpp | 2 +-
10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jean-Christophe Fillion-Robin
2cbd9dd498
[PATCH] cmake: Support source include with add_subdirectory and
...
find_package use
This commit allows the sources of the project to be included in a parent
project CMakeLists.txt and support use of "find_package(Eigen3 CONFIG REQUIRED)"
Here is an example allowing to test the changes. It is not particularly
useful in itself. This change will allow to support one of the scenario
allowing to create custom 3D Slicer application bundling associated plugins.
/tmp/eigen-git-mirror # Eigen sources
/tmp/test/CMakeLists.txt:
cmake_minimum_required(VERSION 3.12)
project(test)
add_subdirectory("/tmp/eigen-git-mirror" "eigen-git-mirror")
find_package(Eigen3 CONFIG REQUIRED)
and configuring it using:
mkdir /tmp/test-build && cd $_
cmake \
-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY:BOOL=1 \
-DEigen3_DIR:PATH=/tmp/test-build/eigen-git-mirror \
/tmp/test
Co-authored-by: Pablo Hernandez <pablo.hernandez@kitware.com >
---
CMakeLists.txt | 1 +
cmake/Eigen3Config.cmake.in | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
2018-09-07 15:50:19 -04:00
Christoph Hertzberg
a80a290079
Fix 'template argument uses local type'-warnings (when compiled in C++03 mode)
2018-09-10 18:57:28 +02:00
Jiandong Ruan
6dcd2642aa
bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above
2018-09-08 12:05:33 -07:00
Christoph Hertzberg
edfb7962fd
Use static const int instead of enum to avoid numerous local-type-template-args warnings in C++03 mode
2018-09-07 14:08:39 +02:00
Alexey Frunze
ec38f07b79
bug #1595 : Don't use C++11's std::isnan() in MIPS/MSA packet math.
...
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
Eugene Zhulenev
1b0373ae10
Replace all using declarations with typedefs in Tensor ops
2018-08-01 15:55:46 -07:00
Rasmus Munk Larsen
7f8b53fd0e
bug #1580 : Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation.
...
Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang.
See also PR 450.
2018-08-01 12:36:24 -07:00
Rasmus Munk Larsen
bcb29f890c
Fix initialization order.
2018-08-03 10:18:53 -07:00
Benoit Steiner
cf17794ef4
Merged in codeplaysoftware/eigen-upstream-pure/SYCL-required-changes (pull request PR-454)
...
SYCL required changes
2018-08-03 16:17:30 +00:00
Mehdi Goli
3074b1ff9e
Fixing the compilation error.
2018-08-03 17:13:44 +01:00
Mehdi Goli
225fa112aa
Merge with upstream.
2018-08-03 17:04:08 +01:00
Mehdi Goli
01358300d5
Creating separate SYCL required PR for uncontroversial files.
2018-08-03 16:59:15 +01:00
Gustavo Lima Chaves
2bf1cc8cf7
Fix 256 bit packet size assumptions in unit tests.
...
Like in change 2606abed53
, we have hit the threshould again. With
AVX512 builds we would never have Vector8f packets aligned at 64
bytes (the new value of EIGEN_MAX_ALIGN_BYTES after change 405859f18d
,
for AVX512-enabled builds).
This makes test/dynalloc.cpp pass for those builds.
2018-08-02 15:55:36 -07:00
Benoit Steiner
dd5875e30d
Merged in codeplaysoftware/eigen-upstream-pure/constructor_error_clang (pull request PR-451)
...
Fixing ambigous constructor error for Clang compiler.
2018-08-02 20:46:03 +00:00
Benoit Steiner
113d8343d6
Merged in codeplaysoftware/eigen-upstream-pure/Fixing_visual_studio_error_For_tensor_trace (pull request PR-452)
...
Fixing compilation error for cxx11_tensor_trace.cpp on Microsoft Visual Studio.
2018-08-02 17:54:26 +00:00
Mehdi Goli
516d2621b9
fixing compilation error for cxx11_tensor_trace.cpp error on Microsoft Visual Studio.
2018-08-02 14:30:48 +01:00
Mehdi Goli
40d6d020a0
Fixing ambigous constructor error for Clang compiler.
2018-08-02 13:34:53 +01:00
Gael Guennebaud
62169419ab
Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue.
2018-08-01 23:35:34 +02:00
Eugene Zhulenev
64abdf1d7e
Fix typo + get rid of redundant member variables for block sizes
2018-08-01 12:35:19 -07:00
Benoit Steiner
93b9e36e10
Merged in paultucker/eigen (pull request PR-431)
...
Optional ThreadPoolDevice allocator
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-08-01 19:14:34 +00:00
Eugene Zhulenev
385b3ff12f
Merged latest changes from upstream/eigen
2018-08-01 11:59:04 -07:00
Benoit Steiner
17221115c9
Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447)
...
Adding variadic version of assert which can take a parameter pack as its input.
2018-08-01 16:41:54 +00:00
Benoit Steiner
0360c36170
Merged in codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446)
...
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 16:13:15 +00:00
Mehdi Goli
c6a5c70712
Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h
2018-08-01 16:56:26 +01:00
Benoit Steiner
9ca1c09131
Merged in codeplaysoftware/eigen-upstream-pure/new-arch-SYCL-headers (pull request PR-448)
...
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 15:50:54 +00:00
Benoit Steiner
45f75f1ace
Merged in codeplaysoftware/eigen-upstream-pure/using_PacketType_class (pull request PR-449)
...
Enabling per device specialisation of packetSize.
2018-08-01 15:43:03 +00:00
Benoit Steiner
90e632fd66
Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_STRONG_INLINE_MACRO (pull request PR-445)
...
Replacing ad-hoc inline keyword with EIGEN_STRONG_INLINE MACRO.
2018-08-01 15:41:06 +00:00
Mehdi Goli
af96018b49
Using the suggested modification.
2018-08-01 16:04:44 +01:00
Mehdi Goli
b512a9536f
Enabling per device specialisation of packetsize.
2018-08-01 13:39:13 +01:00
Mehdi Goli
c84509d7cc
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 12:40:54 +01:00
Mehdi Goli
3a197a60e6
variadic version of assert which can take a parameter pack as its input.
2018-08-01 12:19:14 +01:00
Mehdi Goli
d7a8414848
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 11:56:30 +01:00
Mehdi Goli
9e219bb3d3
Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO.
2018-08-01 10:47:49 +01:00
Eugene Zhulenev
83c0a16baf
Add block evaluation support to TensorOps
2018-07-31 15:56:31 -07:00
Benoit Steiner
edf46bd7a2
Merged in yuefengz/eigen (pull request PR-370)
...
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Paul Tucker
385f7b8d0c
Change getAllocator() to allocator() in ThreadPoolDevice.
2018-07-31 13:52:18 -07:00
Mark D Ryan
6f5b126e6d
Fix tensor contraction for AVX512 machines
...
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8. Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size. This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
2018-07-31 09:33:37 +01:00
Gael Guennebaud
d6568425f8
Close branch tiling_3.
2018-07-31 08:13:43 +00:00
Gael Guennebaud
678a0dcb12
Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
...
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876
Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437.
2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1
bug #1577 : fix msvc compilation of unit test, msvc defines ptrdiff_t as long long
2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6
Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible
2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43
Add tiled evaluation support to TensorExecutor
2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207
bug #1578 : Improve prefetching in matrix multiplication on MIPS.
2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f
Fix two small typos in the documentation
2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4
Merged in rmlarsen/eigen1 (pull request PR-441)
...
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585
Re-enable FMA for fast sqrt functions
2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4
Re-enable FMA for fast sqrt functions
...
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms. The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2
Add pcast packet op for NEON.
2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1
DIsable static assertions only when necessary and disable double-promotion warnings in that case as well
2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a
fix warnings for doc-eigen-prerequisites
2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9
Removed several shadowing types and use global Index typedef everywhere
2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337
Rename variable which shadows class name
2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca
Account for missing change on commit "Remove SimpleThreadPool and..."
...
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5
Fixed issue which made documentation not getting built anymore
2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40
Allow to filter out build-error messages
2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f
TensorBlockIO
2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c
Initial support of TensorBlock
2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d
Merged in glchaves/eigen (pull request PR-433)
...
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42
fix typo
2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a
Add lastN shorcuts to seq/seqN.
2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
...
block
Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb
Disable type traits for stdlibc++ <= 4.9.3
2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937
Oopps, EIGEN_COMP_MSVC is not available before including Eigen.
2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc
Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build.
2018-07-20 08:36:38 -07:00
Paul Tucker
d4afccde5a
Add test coverage for ThreadPoolDevice optional allocator.
2018-07-19 17:43:44 -07:00
Eugene Zhulenev
c58b874727
PR430: Convert count to the reducer type in MeanReducer
...
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.
cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.
static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac
Pass by const ref.
2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f
Fix IsRelocatable without C++11
2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009
Fix determination of EIGEN_HAS_TYPE_TRAITS
2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f
Fix stupid error in Quaternion move ctor
2018-07-19 18:33:53 +02:00
Paul Tucker
4e9848fa86
Actually add optional Allocator* arg to ThreadPoolDevice().
2018-07-16 17:53:36 -07:00
Paul Tucker
b3e7c9132d
Add optional Allocator argument to ThreadPoolDevice constructor.
...
When supplied, this allocator will be used in place of
internal::aligned_malloc. This permits e.g. use of a NUMA-node specific
allocator where the thread-pool is also restricted a single NUMA-node.
2018-07-16 17:26:05 -07:00
Gael Guennebaud
40797dbea3
bug #1572 : use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11.
2018-07-17 00:11:20 +02:00
Gael Guennebaud
add5757488
Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder.
2018-07-16 18:55:40 +02:00
Gael Guennebaud
901c7d31f0
Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally.
2018-07-16 18:35:05 +02:00
Gael Guennebaud
f2b52f9946
Add the cmake option "EIGEN_DASHBOARD_BUILD_TARGET" to control the build target in dashboard mode (e.g., ctest -D Experimental)
2018-07-16 17:59:30 +02:00
Gael Guennebaud
23d82c1ac5
Merged in rmlarsen/eigen2 (pull request PR-422)
...
Optimize the case where broadcasting is a no-op.
2018-07-14 11:42:58 +00:00
Gael Guennebaud
a87cff20df
Fix GeneralizedEigenSolver when requesting for eigenvalues only.
2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen
3a9cf4e290
Get rid of alias for m_broadcast.
2018-07-13 16:24:48 -07:00
Rasmus Munk Larsen
4222550e17
Optimize the case where broadcasting is a no-op.
2018-07-13 16:12:38 -07:00
Rasmus Munk Larsen
4a3952fd55
Relax the condition to not only work on Android.
2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen
02a9443db9
Clang produces incorrect Thumb2 assembler when using alloca.
...
Don't define EIGEN_ALLOCA when generating Thumb with clang.
2018-07-13 11:03:04 -07:00
Gael Guennebaud
20991c3203
bug #1571 : fix is_convertible<from,to> with "from" a reference.
2018-07-13 17:47:28 +02:00
Gael Guennebaud
1920129d71
Remove clang warning
2018-07-13 16:05:35 +02:00
Gael Guennebaud
195c9c054b
Print more debug info in gpu_basic
2018-07-13 16:05:07 +02:00
Gael Guennebaud
06eb24cf4d
Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda.
2018-07-13 16:04:27 +02:00
Gael Guennebaud
5fd03ddbfb
Make EIGEN_TEST_CUDA_CLANG more friendly with OSX
2018-07-13 16:03:14 +02:00
Gael Guennebaud
86d9c0255c
Forward declaring std::array does not work with all std libs, so let's just include <array>
2018-07-13 13:06:44 +02:00
David Hyde
d908afe35f
bug #1558 : fix a corner case in MINRES when both v_new and w_new vanish.
2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379
Reduce number of allocations in TensorContractionThreadPool.
2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746
bug #1569 : fix Tensor<half>::mean() on AVX with respective unit test.
2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304
Add MIPS changes missing from previous merge.
2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739
Assert that no output kernel is defined for GPU contraction
2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85
Disable type traits for GCC < 5.1.0
2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce
Specify default output kernel for TensorContractionOp
2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f
Add regression for bugs #1573 and #1575
2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88
bug #1432 : fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable
2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72
Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests.
2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725
bug #1575 : fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted.
2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9
More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA
2018-07-18 13:51:36 +02:00
Alexey Frunze
3875fb05aa
Add support for MIPS SIMD (MSA)
2018-07-06 16:04:30 -07:00
Gael Guennebaud
44ea5f7623
Add unit test for -Tensor<complex> on GPU
2018-07-12 17:19:38 +02:00
Gael Guennebaud
12e1ebb68b
Remove local Index typedef from unit-tests
2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2
Disable eigenvalues test for clang-cuda
2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9
fix unused warning
2018-07-12 17:02:18 +02:00
Gael Guennebaud
5c73c9223a
Fix shadowing typedefs
2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8
Fix compilation regarding std::array
2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb
fix unused warning
2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b
Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
...
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Thales Sabino
9a6a43319f
Fix cxx11_tensor_fft not building on Windows.
...
The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types.
* * *
Changing from "long long" to "std::int64_t".
2018-07-12 11:20:59 +01:00
Gael Guennebaud
b347eb0b1c
Fix doc
2018-07-12 11:56:18 +02:00
Mark D Ryan
e79c5149bf
Fix AVX512 implementations of psqrt
...
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
fixed the AVX2 version of this function. The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN. Fixing the issues requires adding
some additional instructions that slow down the algorithms. A
similar test to the one used in 3ed67cb0bb
shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7
Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances.
2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e
Fix implicit conversion from 0.0 to scalar
2018-02-16 22:26:01 +04:00
Gael Guennebaud
937ad18221
add unit test for SimplicialCholesky and Boost multiprec.
2018-02-16 22:25:11 +04:00
Julian Kent
6d451cf2b6
Add missing consts for rows and cols functions in SparseLU
2018-02-10 13:44:05 +01:00
Daniele E. Domenichelli
a12b8a8c75
FindEigen3: Set Eigen3_FOUND variable
2018-07-11 16:31:50 +02:00
Gael Guennebaud
8bdb214fd0
remove double ;;
2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3
bug #1570 : fix warning
2018-07-12 11:07:09 +02:00
Gael Guennebaud
6cd6551b26
Add deprecated header files for TensorFlow
2018-07-12 10:50:53 +02:00
Gael Guennebaud
da0c604078
Merged in deven-amd/eigen (pull request PR-402)
...
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7
Remove useless specialization thanks to is_convertible being more robust.
2018-07-12 09:59:44 +02:00
Gael Guennebaud
8a40dda5a6
Add some basic unit-tests
2018-07-12 09:59:00 +02:00
Gael Guennebaud
8ef267ccbd
spellcheck
2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b
Make is_convertible more robust and conformant to std::is_convertible
2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052
Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product.
2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4
Fix regression in 9357838f94
2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736
Fix double ;;
2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39
Updates corresponding to the latest round of PR feedback
...
The major changes are
1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
The above three changes effectively enable the Eigen "Packet" layer for the HIP platform
4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places
The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904
deleting hip specific files that are no longer required
2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493
renaming CUDA* to GPU* for some header files
2018-07-11 09:26:54 -04:00
Deven Desai
471cfe5ff7
renaming CUDA* to GPU* for some header files
2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575
merging updates from upstream
2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a
Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix.
2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091
Add internall::is_identity compile-time helper
2018-07-11 14:00:24 +02:00
Gael Guennebaud
fe723d6129
Fix conversion warning
2018-07-10 09:10:32 +02:00
Gael Guennebaud
9357838f94
bug #1543 : improve linear indexing for general block expressions
2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d
Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
...
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
6190aa5632
bug #1567 : add optimized path for tensor broadcasting and 'Channel First' shape
2018-07-09 11:23:16 +02:00
Gael Guennebaud
ec323b7e66
Skip null numerators in triangular-vector-solve (as in BLAS TRSV).
2018-07-09 11:13:19 +02:00
Gael Guennebaud
359dd77ec3
Fix legitimate "declaration shadows a typedef" warning
2018-07-09 11:03:39 +02:00
Deven Desai
e2b2c61533
merging from master
2018-06-20 16:47:45 -04:00
Deven Desai
1bb6fa99a3
merging the CUDA and HIP implementation for the Tensor directory and the unit tests
2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f
removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9
renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories
2018-06-20 12:52:30 -04:00
Deven Desai
ee73ae0a80
Merged eigen/eigen into default
2018-06-20 12:37:11 -04:00
Mark D Ryan
90a53ca6fd
Fix the Packet16h version of ptranspose
...
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly. This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order. This
error is responsible for one tensorflow unit test failure on AVX512
machines:
//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
Gael Guennebaud
1f54164eca
Fix a few issues with Packet16h
2018-07-07 00:15:07 +02:00
Gael Guennebaud
f2dc048df9
complete implementation of Packet16h (AVX512)
2018-07-06 17:43:11 +02:00
Gael Guennebaud
a937c50208
palign is not used anymore, so let's relax the unit test
2018-07-06 17:41:52 +02:00
Gael Guennebaud
56a33ae57d
test product kernel with half-floats.
2018-07-06 17:14:04 +02:00
Gael Guennebaud
f4d623ffa7
Complete Packet8h implementation and test it in packetmath unit test
2018-07-06 17:13:36 +02:00
Gael Guennebaud
a8ab6060df
Add unitests for inverse and selfadjoint-eigenvalues on CUDA
2018-07-06 09:58:45 +02:00
Gael Guennebaud
b8271bb368
fix md5sum of lapack_addons
2018-06-15 14:21:29 +02:00
Deven Desai
b6cc0961b1
updates based on PR feedback
...
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)
1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`
After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.
2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
ba972fb6b4
moving Half headers from CUDA dir to GPU dir, removing the HIP versions
2018-06-13 12:26:18 -04:00
Deven Desai
d1d22ef0f4
syncing this fork with upstream
2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d
Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
...
Derivative of the incomplete Gamma function and the sample of a Gamma random variable
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-06-11 17:57:47 +00:00
Andrea Bocci
f7124b3e46
Extend CUDA support to matrix inversion and selfadjointeigensolver
2018-06-11 18:33:24 +02:00
Gael Guennebaud
0537123953
bug #1565 : help MSVC to generatenot too bad ASM in reductions.
2018-07-05 09:21:26 +02:00
Gael Guennebaud
6a241bd8ee
Implement custom inplace triangular product to avoid a temporary
2018-07-03 14:02:46 +02:00
Gael Guennebaud
3ae2083e23
Make is_same_dense compatible with different scalar types.
2018-07-03 13:21:43 +02:00
Gael Guennebaud
67ec37f7b0
Activate dgmres unit test
2018-07-02 12:54:14 +02:00
Gael Guennebaud
047677a08d
Fix regression in changeset f05dea6b23
...
: computeFromHessenberg can take any expression for matrixQ, not only an HouseholderSequence.
2018-07-02 12:18:25 +02:00
Gael Guennebaud
d625564936
Simplify redux_evaluator using inheritance, and properly rename parameters in reducers.
2018-07-02 11:50:41 +02:00
Gael Guennebaud
d428a199ab
bug #1562 : optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x...
2018-07-02 11:41:09 +02:00
Gael Guennebaud
a7b313a16c
Fix unit test
2018-07-01 22:45:47 +02:00
Gael Guennebaud
0cdacf3fa4
update comment
2018-06-29 11:28:36 +02:00
Gael Guennebaud
54f6eeda90
Merged in net147/eigen (pull request PR-411)
...
Use std::complex constructor instead of assignment from scalar
2018-06-28 21:01:04 +00:00
Gael Guennebaud
9a81de1d35
Fix order of EIGEN_DEVICE_FUNC and returned type
2018-06-28 00:20:59 +02:00
Jonathan Liu
b7689bded9
Use std::complex constructor instead of assignment from scalar
...
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Gael Guennebaud
f9d337780d
First step towards a generic vectorised quaternion product
2018-06-25 14:26:51 +02:00
Gael Guennebaud
ee5864f72e
bug #1560 fix product with a 1x1 diagonal matrix
2018-06-25 10:30:12 +02:00
Rasmus Munk Larsen
2f62cc68cd
merge
2018-06-22 15:09:44 -07:00
Rasmus Munk Larsen
bda71ad394
Fix typo in pbend for AltiVec.
2018-06-22 15:04:35 -07:00
Benoit Steiner
b6ffcd22e3
Merged in rmlarsen/eigen2 (pull request PR-409)
...
Fix oversharding bug in parallelFor.
2018-06-21 18:34:57 +00:00
Gael Guennebaud
4cc32d80fd
bug #1555 : compilation fix with XLC
2018-06-21 10:28:38 +02:00
Rasmus Munk Larsen
5418154a45
Fix oversharding bug in parallelFor.
2018-06-20 17:51:48 -07:00
Gael Guennebaud
cb4c9a6a94
bug #1531 : make dedicatd unit testing for NumDimensions
2018-06-08 17:11:45 +02:00
Gael Guennebaud
d6813fb1c5
bug #1531 : expose NumDimensions for solve and sparse expressions.
2018-06-08 16:55:10 +02:00
Gael Guennebaud
89d65bb9d6
bug #1531 : expose NumDimensions for compatibility with Tensor
2018-06-08 16:50:17 +02:00
Gael Guennebaud
f05dea6b23
bug #1550 : prevent avoidable memory allocation in RealSchur
2018-06-08 10:14:57 +02:00
Gael Guennebaud
7933267c67
fix prototype
2018-06-08 09:56:01 +02:00
Gael Guennebaud
f4d1461874
Fix the way matrix folder is passed to the tests.
2018-06-08 09:55:46 +02:00
Benoit Steiner
522d3ca54d
Don't use std::equal_to inside cuda kernels since it's not supported.
2018-06-07 13:02:07 -07:00
Christoph Hertzberg
7d7bb91537
Missing line during manual rebase of PR-374
2018-06-07 20:30:09 +02:00
Michael Figurnov
30fa3d0454
Merge from eigen/eigen
2018-06-07 17:57:56 +01:00
Benoit Steiner
d2b0a4a59b
Merged in mfigurnov/eigen/fix-bessel (pull request PR-404)
...
Fix compilation of special functions without C99 math.
2018-06-07 16:12:42 +00:00
Michael Figurnov
6c71c7d360
Merge from eigen/eigen.
2018-06-07 15:54:18 +01:00
Gael Guennebaud
c25034710e
Fiw some warnings in dox examples
2018-06-07 16:09:22 +02:00
Gael Guennebaud
37348d03ae
Fix int versus Index
2018-06-07 15:56:43 +02:00
Gael Guennebaud
c723ffd763
Fix warning
2018-06-07 15:56:20 +02:00
Gael Guennebaud
af7c83b9a2
Fix warning
2018-06-07 15:45:24 +02:00
Gael Guennebaud
7fe29aceeb
Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not __declspec(nothrow)
2018-06-07 15:36:20 +02:00
Michael Figurnov
aa813d417b
Fix compilation of special functions without C99 math.
...
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
55774b48e4
Fix short vs long
2018-06-07 15:26:25 +02:00
Christoph Hertzberg
e5f9f4768f
Avoid unnecessary C++11 dependency
2018-06-07 15:03:50 +02:00
Gael Guennebaud
b3fd93207b
Fix typos found using codespell
2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849
Updated the stopping criteria in igammac_cf_impl.
...
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37
Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
...
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter
The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b
Adding support for using Eigen in HIP kernels.
...
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.
Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)
Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4
Merged in mfigurnov/eigen (pull request PR-400)
...
Exponentially scaled modified Bessel functions of order zero and one.
Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com >
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab
Add a ThreadPoolInterface* getter for ThreadPoolDevice.
2018-06-02 12:07:49 -07:00
Gael Guennebaud
84868da904
Don't run hg on non mercurial clone
2018-05-31 21:21:57 +02:00
Michael Figurnov
f216854453
Exponentially scaled modified Bessel functions of order zero and one.
...
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)
The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Gael Guennebaud
6af1433cb5
Doc: add aliasing in common pitfaffs.
2018-05-29 22:37:47 +02:00
Katrin Leinweber
ea94543190
Hyperlink DOIs against preferred resolver
2018-05-24 18:55:40 +02:00
Gael Guennebaud
999b552c16
Search for sequential Pastix.
2018-05-29 20:49:25 +02:00
Gael Guennebaud
eef4b7bd87
Fix handling of path names containing spaces and the likes.
2018-05-29 20:49:06 +02:00
Gael Guennebaud
647b724a36
Define pcast<> for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted)
2018-05-29 20:46:46 +02:00
Gael Guennebaud
49262dfee6
Fix compilation and SSE support with PGI compiler
2018-05-29 15:09:31 +02:00
Christoph Hertzberg
750af06362
Add an option to test with external BLAS library
2018-05-22 21:04:32 +02:00
Christoph Hertzberg
d06a753d10
Make qr_fullpivoting unit test run for fixed-sized matrices
2018-05-22 20:29:17 +02:00
Gael Guennebaud
f0862b062f
Fix internal::is_integral<size_t/ptrdiff_t> with MSVC 2013 and older.
2018-05-22 19:29:51 +02:00
Gael Guennebaud
36e413a534
Workaround a MSVC 2013 compilation issue with MatrixBase(Index,int)
2018-05-22 18:51:35 +02:00
Gael Guennebaud
725bd92903
fix stupid typo
2018-05-18 17:46:43 +02:00
Gael Guennebaud
a382bc9364
is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's rather use __is_enum(T) for old MSVC versions
2018-05-18 17:02:27 +02:00
Gael Guennebaud
4dd767f455
add some internal checks
2018-05-18 13:59:55 +02:00
Gael Guennebaud
345c0ab450
check that all integer types are properly handled by mat(i,j)
2018-05-18 13:46:46 +02:00
Mark D Ryan
405859f18d
Set EIGEN_IDEAL_MAX_ALIGN_BYTES correctly for AVX512 builds
...
bug #1548
The macro EIGEN_IDEAL_MAX_ALIGN_BYTES is being incorrectly set to 32
on AVX512 builds. It should be set to 64. In the current code it is
only set to 64 if the macro EIGEN_VECTORIZE_AVX512 is defined. This
macro does get defined in AVX512 builds in Core, but only after Macros.h,
the file that defines EIGEN_IDEAL_MAX_ALIGN_BYTES, has been included.
This commit fixes the issue by setting EIGEN_IDEAL_MAX_ALIGN_BYTES to
64 if __AVX512F__ is defined.
2018-05-17 17:04:00 +01:00
Vamsi Sripathi
6293ad3f39
Performance improvements to tensor broadcast operation
...
1. Added new packet functions using SIMD for NByOne, OneByN cases
2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Gael Guennebaud
7134fa7a2e
Fix compilation with MSVC by reverting to char* for _mm_prefetch except for PGI (the later being the one that has the wrong prototype).
2018-06-07 09:33:10 +02:00
Jeff Trull
e7147f69ae
Add tests for sparseQR results (value and size) covering bugs #1522 and #1544
2018-04-21 10:26:30 -07:00
Robert Lukierski
b2053990d0
Adding EIGEN_DEVICE_FUNC to Products, especially Dense2Dense Assignment
...
specializations. Otherwise causes problems with small fixed size matrix multiplication (call to
0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
2018-03-14 16:19:43 +00:00
Jeff Trull
9f0c5c3669
Make sparse QR result sizes consistent with dense QR, with the following rules:
...
1) Q is always square
2) Q*R*P' is valid and recovers the original matrix
This implies that the size of Q is the number of rows in the original matrix, square,
and that the size of R is the size of the original matrix.
2018-02-15 15:00:31 -08:00
Christoph Hertzberg
d655900953
bug #1544 : Generate correct Q matrix in complex case. Original patch was by Jeff Trull in PR-386.
2018-05-17 19:17:01 +02:00
Benoit Steiner
0371380d5b
Merged in rmlarsen/eigen2 (pull request PR-393)
...
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa
Rename clip2 to clamp.
2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3
Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow.
2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b
Merged in didierjansen/eigen (pull request PR-360)
...
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Benoit Steiner
ad355b3f05
Merged in rmlarsen/eigen2 (pull request PR-392)
...
Add vectorized clip functor for Eigen Tensors
2018-05-16 01:15:56 +00:00
Christoph Hertzberg
0272f2451a
Fix "suggest parentheses around comparison" warning
2018-05-15 19:35:53 +02:00
Rasmus Munk Larsen
afec3021f7
Use numext::maxi & numext::mini.
2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen
b8c8e5f436
Add vectorized clip functor for Eigen Tensors.
2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f
Enable RawAccess to tensor slices whenever possinle.
...
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Gael Guennebaud
6e7118265d
Fix compilation with NEON+MSVC
2018-04-26 10:50:41 +02:00
Gael Guennebaud
097dd4616d
Fix unit test for SIMD engine not supporting sqrt
2018-04-26 10:47:39 +02:00
Gael Guennebaud
8810baaed4
Add multi-threading for sparse-row-major * dense-row-major
2018-04-25 10:14:48 +02:00
Gael Guennebaud
2f3287da7d
Fix "used uninitialized" warnings
2018-04-24 17:17:25 +02:00
Gael Guennebaud
3ffd449ef5
Workaround warning
2018-04-24 17:11:51 +02:00
Gael Guennebaud
e8ca5166a9
bug #1428 : atempt to make NEON vectorization compilable by MSVC.
...
The workaround is to wrap NEON packet types to make them different c++ types.
2018-04-24 11:19:49 +02:00
Benoit Steiner
6f5935421a
fix AVX512 plog
2018-04-23 15:49:26 +00:00
Gael Guennebaud
e9da464e20
Add specializations of is_arithmetic for long long in c++11
2018-04-23 16:26:29 +02:00
Gael Guennebaud
a57e6e5f0f
workaround MSVC 2013 compilation issue (ambiguous call)
2018-04-23 15:31:51 +02:00
Gael Guennebaud
11123175db
typo in doc
2018-04-23 15:30:35 +02:00
Gael Guennebaud
5679e439e0
bug #1543 : fix linear indexing in generic block evaluation (this completes the fix in commit 12efc7d41b
...
)
2018-04-23 14:40:16 +02:00
Gael Guennebaud
35b31353ab
Fix unit test
2018-04-22 22:49:08 +02:00
Christoph Hertzberg
34e499ad36
Disable -Wshadow when compiling with g++
2018-04-21 22:08:26 +02:00
Jayaram Bobba
b7b868d1c4
fix AVX512 plog
2018-04-20 13:39:18 -07:00
Gael Guennebaud
686fb57233
fix const cast in NEON
2018-04-18 18:46:34 +02:00
Dmitriy Korchemkin
02d2f1cb4a
Cast zeros to Scalar in RealSchur
2018-04-18 13:52:46 +03:00
Christoph Hertzberg
50633d1a83
Renamed .trans() et al. to .reverseFlag() et at. Adapted documentation of .setReverseFlag()
2018-04-17 11:30:27 +02:00
nicolov
39c2cba810
Add a specialization of Eigen::numext::conj for std::complex<T> to be used when compiling a cuda kernel. This fixes the compilation of TensorFlow 1.4 with clang 6.0 used as CUDA compiler with libc++.
...
This follows the previous change in 2a69290ddb
, which mentions OSX (I guess because it uses libc++ too).
2018-04-13 22:29:10 +00:00
Christoph Hertzberg
775766d175
Add parenthesis to fix compiler warnings
2018-04-15 18:43:56 +02:00
Christoph Hertzberg
42715533f1
bug #1493 : Make representation of HouseholderSequence consistent and working for complex numbers. Made corresponding unit test actually test that. Also simplify implementation of QR decompositions
2018-04-15 10:15:28 +02:00
Christoph Hertzberg
c9ecfff2e6
Add links where to make PRs and report bugs into README.md
2018-04-13 21:05:28 +00:00
Christoph Hertzberg
c8b19702bc
Limit test size for sparse Cholesky solvers to EIGEN_TEST_MAX_SIZE
2018-04-13 20:36:58 +02:00
Christoph Hertzberg
2cbb00b18e
No need to make noise, if KLU is found
2018-04-13 19:14:25 +02:00
Christoph Hertzberg
84dcd998a9
Recent Adolc versions require C++11
2018-04-13 19:10:23 +02:00
Christoph Hertzberg
4d392d93aa
Make hypot_impl compile again for types with expression-templates (e.g., boost::multiprecision)
2018-04-13 19:01:37 +02:00
Christoph Hertzberg
072e111ec0
SelfAdjointView<...,Mode> causes a static assert since commit d820ab9edc
2018-04-13 19:00:34 +02:00
Gael Guennebaud
7a9089c33c
fix linking issue
2018-04-13 08:51:47 +02:00
Gael Guennebaud
e43ca0320d
bug #1520 : workaround some -Wfloat-equal warnings by calling std::equal_to
2018-04-11 15:24:13 +02:00
Weiming Zhao
b0eda3cb9f
Avoid using memcpy for non-POD elements
2018-04-11 11:37:06 +02:00
Gael Guennebaud
79266fec75
extend doxygen splitter for huge screens
2018-04-11 11:31:17 +02:00
Gael Guennebaud
426052ef6e
Update header/footer for doxygen 1.8.13
2018-04-11 11:30:34 +02:00
Gael Guennebaud
9c8decffbf
Fix javascript hacks for oxygen 1.8.13
2018-04-11 11:30:14 +02:00
Gael Guennebaud
e798466871
bug #1538 : update manual pages regarding BDCSVD.
2018-04-11 10:46:11 +02:00
Gael Guennebaud
c91906b065
Umfpack: UF_long has been removed in recent versions of suitesparse, and fix a few long-to-int conversions issues.
2018-04-11 09:59:59 +02:00
Gael Guennebaud
0050709ea7
Merged in v_huber/eigen (pull request PR-378)
...
Add interface to umfpack_*l_* functions
2018-04-11 07:43:04 +00:00
Guillaume Jacob
8c1652055a
Fix code sample output in block(int, int, int, int) doxygen
2018-04-09 17:23:59 +02:00
vhuber
08008f67e1
Add unitTest
2018-04-09 17:07:46 +02:00
Gael Guennebaud
add15924ac
Fix MKL backend for symmetric eigenvalues on row-major matrices.
2018-04-09 13:29:26 +02:00
Gael Guennebaud
04b1628e55
Add missing empty line.
2018-04-09 13:28:31 +02:00
Gael Guennebaud
c2624c0318
Fix cmake scripts with no fortran compiler
2018-04-07 08:45:19 +02:00
Gael Guennebaud
2f833b1c64
bug #1509 : fix computeInverseWithCheck for complexes
2018-04-04 15:47:46 +02:00
Gael Guennebaud
b903fa74fd
Extend list of MSVC versions
2018-04-04 15:14:09 +02:00
Gael Guennebaud
403f09ccef
Make stableNorm and blueNorm compatible with 2D matrices.
2018-04-04 15:13:31 +02:00
Gael Guennebaud
4213b63f5c
Factories code between numext::hypot and scalar_hyot_op functor.
2018-04-04 15:12:43 +02:00
Gael Guennebaud
368dd4cd9d
Make innerVector() and innerVectors() methods available to all expressions supported by Block.
...
Before, only SparseBase exposed such methods.
2018-04-04 15:09:21 +02:00
Gael Guennebaud
e116f6847e
bug #1521 : avoid signalling NaN in hypot and make it std::complex<> friendly.
2018-04-04 13:47:23 +02:00
Gael Guennebaud
73729025a4
bug #1521 : add unit test dedicated to numbest::hypos
2018-04-04 13:45:34 +02:00
Gael Guennebaud
13f5df9f67
Add a note on vec_min vs asm
2018-04-04 13:10:38 +02:00
Gael Guennebaud
e91e314347
bug #1494 : makes pmin/pmax behave on Altivec/VSX as on x86 regading NaNs
2018-04-04 11:39:19 +02:00
Gael Guennebaud
112c899304
comment unreachable code
2018-04-03 23:16:43 +02:00
Gael Guennebaud
a1292395d6
Fix compilation of product with inverse transpositions (e.g., mat * Transpositions().inverse())
2018-04-03 23:06:44 +02:00
Gael Guennebaud
8c7b5158a1
commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)
...
Author: George Burgess IV <gbiv@google.com >
Date: Thu Mar 1 11:20:24 2018 -0800
Prefer `::operator new` to `new`
The C++ standard allows compilers much flexibility with `new`
expressions, including eliding them entirely
(https://godbolt.org/g/yS6i91 ). However, calls to `operator new` are
required to be treated like opaque function calls.
Since we're calling `new` for side-effects other than allocating heap
memory, we should prefer the less flexible version.
Signed-off-by: George Burgess IV <gbiv@google.com >
2018-04-03 17:15:38 +02:00
Gael Guennebaud
dd4cc6bd9e
bug #1527 : fix support for MKL's VML (destination was not properly resized)
2018-04-03 17:11:15 +02:00
Gael Guennebaud
c5b56f1fb2
bug #1528 : better use numeric_limits::min() instead of 1/highest() that with underflow.
2018-04-03 16:49:35 +02:00
Gael Guennebaud
8d0ffe3655
bug #1516 : add assertion for out-of-range diagonal index in MatrixBase::diagonal(i)
2018-04-03 16:15:43 +02:00
Gael Guennebaud
407e3e2621
bug #1532 : disable stl::*_negate in C++17 (they are deprecated)
2018-04-03 15:59:30 +02:00
Gael Guennebaud
40b4bf3d32
AVX512: _mm512_rsqrt28_ps is available for AVX512ER only
2018-04-03 14:36:27 +02:00
Gael Guennebaud
584951ca4d
Rename predux_downto4 to be more accurate on its semantic.
2018-04-03 14:28:38 +02:00
Gael Guennebaud
67bac6368c
protect calls to isnan
2018-04-03 14:19:04 +02:00
Gael Guennebaud
d43b2f01f4
Fix unit testing of predux_downto4 (bad name), and add unit testing of prsqrt
2018-04-03 14:14:00 +02:00
Gael Guennebaud
7b0630315f
AVX512: fix psqrt and prsqrt
2018-04-03 14:12:50 +02:00
Gael Guennebaud
6719409cd9
AVX512: add missing pinsertfirst and pinsertlast, implement pblend for Packet8d, fix compilation without AVX512DQ
2018-04-03 14:11:56 +02:00
Gael Guennebaud
524119d32a
Fix uninitialized output argument.
2018-04-03 10:56:10 +02:00
vhuber
267a144da5
Remove unnecessary define
2018-03-30 23:04:53 +02:00
vhuber
baf9a5a776
Add interface to umfpack_*l_* functions
2018-03-30 18:53:34 +02:00
luz.paz
e3912f5e63
MIsc. source and comment typos
...
Found using `codespell` and `grep` from downstream FreeCAD
2018-03-11 10:01:44 -04:00
Gael Guennebaud
5deeb19e7b
bug #1517 : fix triangular product with unit diagonal and nested scaling factor: (s*A).triangularView<UpperUnit>()*B
2018-02-09 16:52:35 +01:00
Gael Guennebaud
12efc7d41b
Fix linear indexing in generic block evaluation.
2018-02-09 16:45:49 +01:00
Gael Guennebaud
f4a6863c75
Fix typo
2018-02-09 16:43:49 +01:00
Viktor Csomor
000840cae0
Added a move constructor and move assignment operator to Tensor and wrote some tests.
2018-02-07 19:10:54 +01:00
Gael Guennebaud
3a2dc3869e
Fix weird issue with MSVC 2013
2018-07-18 02:26:43 -07:00
Eugene Zhulenev
c95aacab90
Fix TensorContractionOp evaluators for GPU and SYCL
2018-07-17 14:09:37 -07:00
Gael Guennebaud
038b55464b
Merged in deven-amd/eigen (pull request PR-425)
...
applying EIGEN_DECLARE_TEST to *gpu unit tests
2018-07-17 21:14:40 +00:00
Deven Desai
f124f07965
applying EIGEN_DECLARE_TEST to *gpu* tests
...
Also, a few minor fixes for GPU tests running in HIP mode.
1. Adding an include for hip/hip_runtime.h in the Macros.h file
For HIP __host__ and __device__ are macros which are defined in hip headers.
Their definitions need to be included before their use in the file.
2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
"Fuse computations into the Tensor contractions using output kernel"
3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
dff3a92d52
Remove usage of #if EIGEN_TEST_PART_XX in unit tests that does not require them (splitting can thus be avoided for them)
2018-07-17 15:52:58 +02:00
Gael Guennebaud
82f0ce2726
Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
...
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Gael Guennebaud
37f4bdd97d
Fix VERIFY_EVALUATION_COUNT(EXPR,N) with a complex expression as N
2018-07-17 13:20:49 +02:00
Gael Guennebaud
2b2cd85694
bug #1573 : add noexcept move constructor and move assignment operator to Quaternion
2018-07-17 11:11:33 +02:00
Eugene Zhulenev
43206ac4de
Call OutputKernel in evalGemv
2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf
Remove SimpleThreadPool and always use {NonBlocking}ThreadPool
2018-07-16 15:06:57 -07:00
Eugene Zhulenev
b324ed55d9
Call OutputKernel in evalGemv
2018-07-12 14:52:23 -07:00
Eugene Zhulenev
01fd4096d3
Fuse computations into the Tensor contractions using output kernel
2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f
Some warning fixes
2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2018-01-30 20:22:12 +00:00
Gael Guennebaud
09a16ba42f
bug #1412 : fix compilation with nvcc+MSVC
2018-01-17 23:13:16 +01:00
Lee.Deokjae
5b3c367926
Fix typos in the contraction example of tensor README
2018-01-06 14:36:19 +09:00
Eugene Chereshnev
f558ad2955
Fix incorrect ldvt in LAPACKE call from JacobiSVD
2018-01-03 12:55:52 -08:00
Benoit Steiner
22de74aa76
Disable use of recurrence for computing twiddle factors.
2018-01-09 18:32:52 +00:00
Gael Guennebaud
73629f8b68
Fix gcc7 warning
2018-01-09 08:59:27 +01:00
RJ Ryan
59985cfd26
Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689
2017-12-31 10:44:56 -05:00
nluehr
f9bdcea022
For cuda 9.1 replace math_functions.hpp with cuda_runtime.h
2017-12-18 16:51:15 -08:00
Gael Guennebaud
06bf1047f9
Fix compilation of stableNorm with some expressions as input
2017-12-15 15:15:37 +01:00
Gael Guennebaud
73214c4bd0
Workaround nvcc 9.0 issue. See PR 351.
...
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Gael Guennebaud
31e0bda2e3
Fix cmake warning
2017-12-14 15:48:27 +01:00
Gael Guennebaud
26a2c6fc16
fix unit test
2017-12-14 15:11:04 +01:00
Gael Guennebaud
546ab97d76
Add possibility to overwrite EIGEN_STRONG_INLINE.
2017-12-14 14:47:38 +01:00
Gael Guennebaud
9c3aed9d48
Fix packet and alignment propagation logic of Block<Xpr> expressions. In particular, (A+B).col(j) lost vectorisation.
2017-12-14 14:24:33 +01:00
Gael Guennebaud
76c7dae600
ignore all *build* sub directories
2017-12-14 14:22:14 +01:00
Gael Guennebaud
b2cacd189e
fix header inclusion
2017-12-14 10:01:02 +01:00
Yangzihao Wang
3122477c86
Update the padding computation for PADDING_SAME to be consistent with TensorFlow.
2017-12-12 11:15:24 -08:00
Benoit Steiner
393b7c4959
Merged in ncluehr/eigen/float2half-fix (pull request PR-349)
...
Replace __float2half_rn with __float2half
2017-12-01 00:29:51 +00:00
nluehr
aefd5fd5c4
Replace __float2half_rn with __float2half
...
The latter provides a consistent definition for CUDA 8.0 and 9.0.
2017-11-28 10:15:46 -08:00
Gael Guennebaud
d0b028e173
clarify Pastix requirements
2017-11-27 22:11:57 +01:00
Gael Guennebaud
3587e481fb
silent MSVC warning
2017-11-27 21:53:02 +01:00
Benoit Steiner
3a327cd3c7
Merged in ncluehr/eigen/predux_fp16_fix (pull request PR-348)
...
Fix incorrect integer cast in half2 predux.
2017-11-21 21:11:45 +00:00
nluehr
dd6de618c3
Fix incorrect integer cast in predux<half2>().
...
Bug corrupts results on Maxwell and earlier GPU architectures.
2017-11-21 10:47:00 -08:00
Gael Guennebaud
3dc6ff73ca
Handle PGI compiler
2017-11-17 22:54:39 +01:00
Zvi Rackover
599a88da27
Disable gcc-specific workaround for Clang to allow build with AVX512
...
There is currently a workaround for an issue in gcc that requires invoking gcc with the -fabi-version flag. This workaround is not needed for Clang and moreover is not supported.
2017-11-16 19:53:38 +00:00
Gael Guennebaud
672bdc126b
bug #1479 : fix failure detection in LDLT
2017-11-16 17:55:24 +01:00
Basil Fierz
624df50945
Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small vector calculations
...
When working with MSVC often small vector operations are not properly inlined. This behaviour is observed even on the most recent compiler versions.
2017-10-26 22:44:28 +02:00
Benoit Steiner
746a6b7b81
Merged in zzp11/eigen/zzp11/a-small-mistake-quickreferencedox-edited-1510217281963 (pull request PR-346)
...
a small mistake QuickReference.dox edited online with Bitbucket
2018-03-23 01:02:34 +00:00
Benoit Steiner
d2631ef61d
Merged in facaiy/eigen/ENH/exp_support_complex_for_gpu (pull request PR-359)
...
ENH: exp supports complex type for cuda
2018-03-23 00:59:15 +00:00
Benoit Steiner
8fcbd6d4c9
Merged in dtrebbien/eigen (pull request PR-369)
...
Move up the specialization of std::numeric_limits
2018-03-23 00:54:58 +00:00
Rasmus Munk Larsen
e900b010c8
Improve robustness of igamma and igammac to bad inputs.
...
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.
Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
f7d17689a5
Add static assertion for fixed sizes Ref<>
2018-03-09 10:11:13 +01:00
Gael Guennebaud
f6be7289d7
Implement better static assertion checking to make sure that the first assertion is a static one and not a runtime one.
2018-03-09 10:00:51 +01:00
Gael Guennebaud
d820ab9edc
Add static assertion on selfadjoint-view's UpLo parameter.
2018-03-09 09:33:43 +01:00
Daniel Trebbien
0c57be407d
Move up the specialization of std::numeric_limits
...
This fixes a compilation error seen when building TensorFlow on macOS:
https://github.com/tensorflow/tensorflow/issues/17067
2018-02-18 15:35:45 -08:00
Yan Facai (颜发才)
42a8334668
ENH: exp supports complex type for cuda
2018-01-04 16:01:01 +08:00
zhouzhaoping
912e9965ef
a small mistake QuickReference.dox edited online with Bitbucket
2017-11-09 08:49:01 +00:00
Gael Guennebaud
4c03b3511e
Fix issue with boost::multiprec in previous commit
2017-11-08 23:28:01 +01:00
Gael Guennebaud
e9d2888e74
Improve debugging tests and output in BDCSVD
2017-11-08 10:26:03 +01:00
Gael Guennebaud
e8468ea91b
Fix overflow issues in BDCSVD
2017-11-08 10:24:28 +01:00
Benoit Steiner
3949615176
Merged in JonasMu/eigen (pull request PR-329)
...
Added an example for a contraction to a scalar value to README.md
Approved-by: Jonas Harsch <jonas.harsch@gmail.com >
2017-10-27 07:27:46 +00:00
Christoph Hertzberg
11ddac57e5
Merged in guillaume_michel/eigen (pull request PR-334)
...
- Add support for NEON plog PacketMath function
2017-10-23 13:22:22 +00:00
Benoit Steiner
a6d875bac8
Removed unecesasry #include
2017-10-22 08:12:45 -07:00
Benoit Steiner
f16ba2a630
Merged in LaFeuille/eigen-1/LaFeuille/typo-fix-alignmeent-alignment-1505889397887 (pull request PR-335)
...
Typo fix alignmeent ->alignment
2017-10-21 01:59:55 +00:00
Benoit Steiner
ee6ad21b25
Merged in henryiii/eigen/henryiii/device (pull request PR-343)
...
Fixing missing inlines on device functions for newer CUDA cards
2017-10-21 01:58:22 +00:00
Henry Schreiner
9bb26eb8f1
Restore __device__
2017-10-21 00:50:38 +00:00
Henry Schreiner
4245475d22
Fixing missing inlines on device functions for newer CUDA cards
2017-10-20 03:20:13 +00:00
Benoit Steiner
8eb4b9d254
Merged in benoitsteiner/opencl (pull request PR-341)
2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
2dd63ed395
Merge
2017-10-13 15:58:52 -07:00
Rasmus Munk Larsen
f349507e02
Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%.
2017-10-13 15:58:12 -07:00
Benoit Steiner
688451409d
Merged in mehdi_goli/upstr_benoit/ComputeCppNewReleaseFix (pull request PR-16)
...
Changes required for new ComputeCpp CE version.
2017-10-13 20:56:01 +00:00
Konstantinos Margaritis
0e6e027e91
check both z13 and z14 arches
2017-10-12 15:38:34 -04:00
Konstantinos Margaritis
6c3475f110
remove debugging
2017-10-12 15:34:55 -04:00
Konstantinos Margaritis
df7644aec3
Merged eigen/eigen into default
2017-10-12 22:23:13 +03:00
Konstantinos Margaritis
98e52cc770
rollback 374f750ad4
2017-10-12 15:22:10 -04:00
Konstantinos Margaritis
c4ad358565
explicitly set conjugate mask
2017-10-11 11:05:29 -04:00
Konstantinos Margaritis
380d41fd76
added some extra debugging
2017-10-11 10:40:12 -04:00
Konstantinos Margaritis
d0b7b9d0d3
some Packet2cf pmul fixes
2017-10-11 10:17:22 -04:00
Konstantinos Margaritis
df173f5620
initial pexp() for 32-bit floats, commented out due to vec_cts()
2017-10-11 09:40:49 -04:00
Konstantinos Margaritis
3dcae2a27f
initial pexp() for 32-bit floats, commented out due to vec_cts()
2017-10-11 09:40:45 -04:00
Konstantinos Margaritis
c2a2246489
fix predux_mul for z14/float
2017-10-10 13:38:32 -04:00
Konstantinos Margaritis
374f750ad4
eliminate 'enumeral and non-enumeral type in conditional expression' warning
2017-10-09 16:56:30 -04:00
Konstantinos Margaritis
bc30305d29
complete z14 port
2017-10-09 16:55:10 -04:00
Gael Guennebaud
0e85a677e3
bug #1472 : fix warning
2017-09-26 10:53:33 +02:00
Gael Guennebaud
8579195169
bug #1468 (1/2) : add missing std:: to memcpy
2017-09-22 09:23:24 +02:00
Gael Guennebaud
f92567fecc
Add link to a useful example.
2017-09-20 10:22:23 +02:00
Gael Guennebaud
7ad07fc6f2
Update documentation for aligned_allocator
2017-09-20 10:22:00 +02:00
LaFeuille
7c9b07dc5c
Typo fix alignmeent ->alignment
2017-09-20 06:38:39 +00:00
Mehdi Goli
2062ac9958
Changes required for new ComputeCpp CE version.
2017-09-18 18:17:39 +01:00
Christoph Hertzberg
23f8b00bc8
clang provides __has_feature(is_enum) (but not <type_traits>) in C++03 mode
2017-09-14 19:26:03 +02:00
Christoph Hertzberg
0c9ad2f525
std::integral_constant is not C++03 compatible
2017-09-14 19:23:38 +02:00
Rasmus Munk Larsen
1b7294f6fc
Fix cut-and-paste error.
2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38
Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
...
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.
Contributed by phurst@google.com .
2017-09-08 15:49:55 -07:00
Gael Guennebaud
6d42309f13
Fix compilation of Vector::operator()(enum) by treating enums as Index
2017-09-07 14:34:30 +02:00
Benoit Steiner
ea4e65bf41
Fixed compilation with cuda_clang.
2017-09-07 09:13:52 +00:00
Gael Guennebaud
a91918a105
Merged in infinitei/eigen (pull request PR-328)
...
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
Approved-by: Tal Hadad <tal_hd@hotmail.com >
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu >
2017-09-06 08:42:14 +00:00
Gael Guennebaud
9c353dd145
Add C++11 max_digits10 for half.
2017-09-06 10:22:47 +02:00
Gael Guennebaud
b35d1ce4a5
Implement true compile-time "if" for apply_rotation_in_the_plane. This fixes a compilation issue for vectorized real type with missing vectorization for complexes, e.g. AVX512.
2017-09-06 10:02:49 +02:00
Gael Guennebaud
80142362ac
Fix mixing types in sparse matrix products.
2017-09-02 22:50:20 +02:00
Jonas Harsch
810b70ad09
Merged in JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 (pull request PR-1)
...
Added an example for a contraction to a scalar value
2017-09-01 12:01:39 +00:00
Jonas Harsch
a34fb212cd
Close branch JonasMu/added-an-example-for-a-contraction-to-a--1504265366851
2017-09-01 12:01:39 +00:00
Jonas Harsch
a991c80365
Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example.
2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb
Added support for CUDA 9.0.
2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.
2017-08-30 13:26:30 -04:00
Gael Guennebaud
304ef29571
Handle min/max/inf/etc issue in cuda_fp16.h directly in test/main.h
2017-08-24 11:26:41 +02:00
Konstantinos Margaritis
1affe3d8df
Merged eigen/eigen into default
2017-08-24 12:24:01 +03:00
Gael Guennebaud
21633e585b
bug #1462 : remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER
2017-08-24 11:06:47 +02:00
Gael Guennebaud
12249849b5
Make the threshold from gemm to coeff-based-product configurable, and add some explanations.
2017-08-24 10:43:21 +02:00
Gael Guennebaud
39864ebe1e
bug #336 : improve doc for PlainObjectBase::Map
2017-08-22 17:18:43 +02:00
Gael Guennebaud
600e52fc7f
Add missing scalar conversion
2017-08-22 17:06:57 +02:00
Gael Guennebaud
9deee79922
bug #1457 : add setUnit() methods for consistency.
2017-08-22 16:48:07 +02:00
Gael Guennebaud
bc4dae9aeb
bug #1449 : fix redux_3 unit test
2017-08-22 15:59:08 +02:00
Gael Guennebaud
bc91a2df8b
bug #1461 : fix compilation of Map<const Quaternion>::x()
2017-08-22 15:10:42 +02:00
Gael Guennebaud
fc39d5954b
Merged in dtrebbien/eigen/patch-1 (pull request PR-312)
...
Work around a compilation error seen with nvcc V8.0.61
2017-08-22 12:17:37 +00:00
Gael Guennebaud
b223918ea9
Doc: warn about constness in LLT::solveInPlace
2017-08-22 14:12:47 +02:00
Konstantinos Margaritis
4ce5ec5197
initial support for z14
2017-08-07 05:54:29 -04:00
Konstantinos Margaritis
e1e71ca4e4
initial support for z14
2017-08-06 19:53:18 -04:00
Benoit Steiner
84d7be103a
Fixing Argmax that was breaking upstream TensorFlow.
2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0
Code cleanup
2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3
Fixed syntax errors generated by xcode
2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51
Avoid relying on cxx11 features when possible.
2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b
Merged in benoitsteiner/opencl (pull request PR-323)
...
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef
Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
...
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952
Merged in tntnatbry/eigen (pull request PR-319)
...
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59
Improved the randomness of the tensor random generator
2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716
Fixed compilation warning
2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe
Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
...
Applying Benoit's comment for Fixing ImageVolumePatch.
* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.
* Fixing dealocation of the memory in ImagePatch test for SYCL.
* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
c92faf9d84
Merged in mehdi_goli/upstr_benoit/HiperbolicOP (pull request PR-13)
...
Adding hyperbolic operations for sycl.
* Adding hyperbolic operations.
* Adding the hyperbolic operations for CPU as well.
2017-07-06 05:05:57 +00:00
Benoit Steiner
53725c10b8
Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
...
DataDependancy
* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.
* Applying Ronnan's Comments.
* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Gael Guennebaud
c010b17360
Fix warning
2017-06-27 14:29:57 +02:00
Gael Guennebaud
561f777075
Fix a gcc7 warning about bool * bool in abs2 default implementation.
2017-06-27 12:05:17 +02:00
Gael Guennebaud
b651ce0ffa
Fix a gcc7 warning: Wint-in-bool-context
2017-06-26 09:58:28 +02:00
Christoph Hertzberg
157040d44f
Make sure CMAKE_Fortran_COMPILER is set before checking for Fortran functions
2017-06-20 16:58:03 +02:00
Gael Guennebaud
24fe1de9b4
merge
2017-06-15 10:17:39 +02:00
Gael Guennebaud
b240080e64
bug #1436 : fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing.
2017-06-15 10:16:30 +02:00
Benoit Steiner
3baef62b9a
Added missing __device__ qualifier
2017-06-13 12:56:55 -07:00
Benoit Steiner
449936828c
Added missing __device__ qualifier
2017-06-13 12:54:57 -07:00
Benoit Steiner
b8e805497e
Merged in benoitsteiner/opencl (pull request PR-318)
...
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Gael Guennebaud
9fbdf02059
Enable Array(EigenBase<>) ctor for compatible scalar types only. This prevents nested arrays to look as being convertible from/to simple arrays.
2017-06-12 22:30:32 +02:00
Gael Guennebaud
e43d8fe9d7
Fix compilation of streaming nested Array, i.e., cout << Array<Array<>>
2017-06-12 22:26:26 +02:00
Gael Guennebaud
d9d7bd6d62
Fix 1x1 case in Solve expression with EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION==RowMajor
2017-06-12 22:25:02 +02:00
Androbin42
95ecb2b5d6
Make buildtests.in more robust
2017-06-12 17:11:06 +00:00
Androbin42
3f7fb5a6d6
Make eigen_monitor_perf.sh more robust
2017-06-12 17:07:56 +00:00
Gael Guennebaud
7f42a93349
Merged in alainvaucher/eigen/find-module-imported-target (pull request PR-324)
...
In the CMake find module, define the Eigen imported target as when installing with CMake
* In the CMake find module, define the Eigen imported target
* Add quotes to the imported location, in case there are spaces in the path.
Approved-by: Alain Vaucher <acvaucher@gmail.com >
2017-11-15 20:45:09 +00:00
Gael Guennebaud
7cc503f9f5
bug #1485 : fix linking issue of non template functions
2017-11-15 21:33:37 +01:00
Gael Guennebaud
103c0aa6ad
Add KLU in the list of third-party sparse solvers
2017-11-10 14:13:29 +01:00
Gael Guennebaud
00bc67c374
Move KLU support to official
2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01
KLU: truely disable unimplemented code, add proper static assertions in solve
2017-11-10 14:09:01 +01:00
Gael Guennebaud
6365f937d6
KLU depends on BTF but not on libSuiteSparse nor Cholmod
2017-11-10 13:58:52 +01:00
Gael Guennebaud
8cf63ccb99
Merged in kylemacfarlan/eigen (pull request PR-337)
...
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e
Merged in spraetor/eigen (pull request PR-305)
...
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380
Merged in jkflying/eigen-fix-scaling (pull request PR-302)
...
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
d306b96fb7
Merged in carpent/eigen (pull request PR-342)
...
Use col method for column-major matrix
2017-11-10 10:09:53 +00:00
Gael Guennebaud
1b2dcf9a47
Check that Schur decomposition succeed.
2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942
bug #1484 : restore deleted line for 128 bits long doubles, and improve dispatching logic.
2017-11-10 10:25:41 +01:00
Gael Guennebaud
f86bb89d39
Add EIGEN_MKL_NO_DIRECT_CALL option
2017-11-09 11:07:45 +01:00
Gael Guennebaud
5fa79f96b8
Patch from Konstantin Arturov to enable MKL's direct call by default
2017-11-09 10:58:38 +01:00
Justin Carpentier
a020d9b134
Use col method for column-major matrix
2017-10-17 21:51:27 +02:00
Kyle Vedder
c0e1d510fd
Add support for SuiteSparse's KLU routines
2017-10-04 21:01:23 -05:00
Gael Guennebaud
6dcf966558
Avoid implicit scalar conversion with accuracy loss in pow(scalar,array)
2017-06-12 16:47:22 +02:00
Gael Guennebaud
50e09cca0f
fix tipo
2017-06-11 15:30:36 +02:00
Gael Guennebaud
a4fd4233ad
Fix compilation with some compilers
2017-06-09 23:02:02 +02:00
Gael Guennebaud
c3e2afce0d
Enable MSVC 2010 workaround from MSVC only
2017-06-09 16:25:18 +02:00
Gael Guennebaud
731c8c704d
bug #1403 : more scalar conversions fixes in BDCSVD
2017-06-09 15:45:49 +02:00
Gael Guennebaud
1bbcf19029
bug #1403 : fix implicit scalar type conversion.
2017-06-09 14:44:02 +02:00
Gael Guennebaud
ba5cab576a
bug #1405 : enable StrictlyLower/StrictlyUpper triangularView as the destination of matrix*matrix products.
2017-06-09 14:38:04 +02:00
Gael Guennebaud
90168c003d
bug #1414 : doxygen, add EigenBase to CoreModule
2017-06-09 14:01:44 +02:00
Gael Guennebaud
26f552c18d
fix compilation of Half in C++98 (issue introduced in previous commit)
2017-06-09 13:36:58 +02:00
Gael Guennebaud
1d59ca2458
Fix compilation with gcc 4.3 and ARM NEON
2017-06-09 13:20:52 +02:00
Gael Guennebaud
fb1ee04087
bug #1410 : fix lvalue propagation of Array/Matrix-Wrapper with a const nested expression.
2017-06-09 13:13:03 +02:00
Gael Guennebaud
723a59ac26
add regression test for aliasing in product rewritting
2017-06-09 12:54:40 +02:00
Gael Guennebaud
8640093af1
fix compilation in C++98
2017-06-09 12:45:01 +02:00
Gael Guennebaud
a7be4cd1b1
Fix LeastSquareDiagonalPreconditioner for complexes (issue introduced in previous commit)
2017-06-09 11:57:53 +02:00
Gael Guennebaud
498aa95a8b
bug #1424 : add numext::abs specialization for unsigned integer types.
2017-06-09 11:53:49 +02:00
Gael Guennebaud
d588822779
Add missing std::numeric_limits specialization for half, and complete NumTraits<half>
2017-06-09 11:51:53 +02:00
Gael Guennebaud
682b2ef17e
bug #1423 : fix LSCG\'s Jacobi preconditioner for row-major matrices.
2017-06-08 15:06:27 +02:00
Gael Guennebaud
4bbc320468
bug #1435 : fix aliasing issue in exressions like: A = C - B*A;
2017-06-08 12:55:25 +02:00
Hugh Perkins
9341f258d4
Add labels to #ifdef, in TensorReductionCuda.h
2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead
Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
...
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33
Merged eigen/eigen into default
2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e
Applying Ronnan's comments.
2017-05-26 16:01:48 +01:00
Benoit Steiner
615aff4d6e
Merged in a-doumoulakis/opencl (pull request PR-12)
...
Enable triSYCL with Eigen
2017-05-25 18:18:23 +00:00
a-doumoulakis
c3bd860de8
Modification upon request
...
- Remove warning suppression
2017-05-25 18:46:18 +01:00
Mehdi Goli
e3f964ed55
Applying Benoit's comment;removing dead code.
2017-05-25 11:17:26 +01:00
Benoit Steiner
df90010cdd
Merged in mehdi_goli/opencl/CmakeFixForUbuntu16.04 (pull request PR-11)
...
CmakeFixForUbuntu16.04
2017-05-24 19:03:57 +00:00
a-doumoulakis
fb853a857a
Restore misplaced comment
2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8
Merge changed from upstream
2017-05-24 17:45:29 +01:00
Mehdi Goli
daf99daadd
Merged in DuncanMcBain/opencl/default (pull request PR-2)
...
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 13:59:53 +00:00
Mehdi Goli
9ef5c948ba
Fixing Cmake for gcc>=5.
2017-05-24 13:11:16 +01:00
Duncan McBain
0cb3c7c7dd
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 12:24:21 +01:00
Mmanu Chaturvedi
2971503fed
Specializing numeric_limits For AutoDiffScalar
2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e
Fix compilation of matrix log with Map as input
2017-06-07 10:51:23 +02:00
Gael Guennebaud
f2a553fb7b
bug #1411 : fix usage of alignment information in vectorization of quaternion product and conjugate.
2017-06-07 10:10:30 +02:00
Christoph Hertzberg
e018142604
Make sure CholmodSupport works when included in multiple compilation units (issue was reported on stackoverflow.com)
2017-06-06 19:23:14 +02:00
Gael Guennebaud
8508db52ab
bug #1417 : make LinSpace compatible with std::complex
2017-06-06 17:25:56 +02:00
Mehdi Goli
9aa7c30163
Merge with Benoit.
2017-05-23 10:51:34 +01:00
Mehdi Goli
b42d775f13
Temporarry branch for synch with upstream
2017-05-23 10:51:14 +01:00
Benoit Steiner
615733381e
Merged in mehdi_goli/opencl/FixingCmakeDependency (pull request PR-2)
...
Fixing Cmake Dependency for SYCL
2017-05-22 17:43:06 +00:00
Benoit Steiner
1500a67c41
Merged in mehdi_goli/opencl/TensorSupportedDevice (pull request PR-6)
...
Fixing suported device list.
2017-05-22 16:22:21 +00:00
Mehdi Goli
76c0fc1f95
Fixing SYCL alignment issue required by TensorFlow.
2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f
Fixing suported device list.
2017-05-22 16:40:33 +01:00
Mehdi Goli
61d7f3664a
Fixing Cmake Dependency for SYCL
2017-05-22 14:58:28 +01:00
a-doumoulakis
a5226ce4f7
Add cmake file FindTriSYCL.cmake
2017-05-17 17:59:30 +01:00
a-doumoulakis
052426b824
Add support for triSYCL
...
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options
Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
Abhijit Kundu
4343db84d8
updated warning number for nvcc relase 8 (V8.0.61) for the stupid warning message 'calling a __host__ function from a __host__ __device__ function is not allowed'.
2017-05-01 10:36:27 -04:00
Abhijit Kundu
9bc0a35731
Fixed nested angle barckets >> issue when compiling with cuda 8
2017-04-27 03:09:03 -04:00
Gael Guennebaud
891ac03483
Fix dense * sparse-selfadjoint-view product.
2017-04-25 13:58:10 +02:00
RJ Ryan
949a2da38c
Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
...
Improves support for std::complex types when compiling for CUDA.
Expands on e2e9cdd169
and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Gael Guennebaud
d9084ac8e1
Improve mixing of complex and real in the vectorized path of apply_rotation_in_the_plane
2017-04-14 11:05:13 +02:00
Gael Guennebaud
f75dfdda7e
Fix unwanted Real to Scalar to Real conversions in column-pivoting QR.
2017-04-14 10:34:30 +02:00
Gael Guennebaud
0f83aeb6b2
Improve cmake scripts for Pastix and BLAS detection.
2017-04-14 10:22:12 +02:00
Benoit Steiner
0d08165a7f
Merged in benoitsteiner/opencl (pull request PR-309)
...
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
068cc09708
Preserve file naming conventions
2017-04-04 10:09:10 -07:00
Benoit Steiner
c302ea7bc4
Deleted empty line of code
2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1
Guard sycl specific code under a EIGEN_USE_SYCL ifdef
2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7
Code cleanup
2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd
Guard the sycl specific code with EIGEN_USE_SYCL
2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a
Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL
2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666
iGate the sycl specific code under a EIGEN_USE_SYCL define
2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0
Fixed compilation error when sycl is enabled.
2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96
fix typos in the Tensor readme
2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6
Restored code compatibility with compilers that dont support c++11
...
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3
Restore the old constructors to retain compatibility with non c++11 compilers.
2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f
Gate the sycl specific code under #ifdef sycl
2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555
Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax.
2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e
Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport.
2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5
Introduces align allocator for SYCL buffer
2017-03-20 14:48:54 +00:00
Gael Guennebaud
aae19c70ac
update has_ReturnType to be more consistent with other has_ helpers
2017-03-17 17:33:15 +01:00
Benoit Steiner
f8a622ef3c
Merged eigen/eigen into default
2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b
Silenced compilation warning
2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab
Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread
2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e
Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration
2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822
Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
...
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881
better check array index before using it
2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2
ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32.
2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118
Made the reduction code compile with cuda-clang
2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496
Adding synchronisation to convolution kernel for sycl backend.
2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b
Get rid of Init().
2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094
Use C++11 ctor forwarding to simplify code a bit.
2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6
Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
...
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.
* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.
* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.
* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053
Use name to distinguish name instead of the vendor
2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7
Fixing typo in sycl Benchmark.
2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294
bug #1401 : fix compilation of "cond ? x : -x" with x an AutoDiffScalar
2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a
Adding sycl Benchmarks.
2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533
Fixing potential race condition on sycl device.
2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95
Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch.
2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25
fix typo
2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb
remove UTF8 symbol
2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac
remove UTF8 symbols
2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525
do not include std header within extern C
2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622
bug #1400 : fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY
2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c
Fixed compilation with cuda-clang
2017-03-06 12:01:12 +01:00
Julian Kent
bbe717fa2f
Make scaling work with non-square matrices
2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4
Made the Tensor code compile with clang 3.9
2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586
Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
...
* they're used consistently between the declaration and the definition of a function
* we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651
Silenced a couple of compilation warnings
2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669
Made most of the packet math primitives usable within CUDA kernel when compiling with clang
2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613
Silenced clang compilation warning.
2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8
Added missing EIGEN_DEVICE_FUNC
2017-02-28 14:00:15 -08:00
Benoit Steiner
de7b0fdea9
Made the TensorStorage class compile with clang 3.9
2017-02-28 13:52:22 -08:00
Benoit Steiner
765f4cc4b4
Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet.
2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876
Added missing EIGEN_DEVICE_FUNC qualifiers
2017-02-28 09:46:30 -08:00
Mehdi Goli
8296b87d7b
Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used.
2017-02-28 17:16:14 +00:00
Gael Guennebaud
4e98a7b2f0
bug #1396 : add some missing EIGEN_DEVICE_FUNC
2017-02-28 09:47:38 +01:00
Gael Guennebaud
478a9f53be
Fix typo.
2017-02-28 09:32:45 +01:00
Benoit Steiner
889c606f8f
Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops
2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa
Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods.
2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a
Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel
2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09
Added missing EIGEN_DEVICE_FUNC qualifiers.
2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1
Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase
2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b
Avoid unecessary float to double conversions.
2017-02-27 16:33:33 -08:00
Benoit Steiner
e0bd6f5738
Merged eigen/eigen into default
2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9
Adding TensorVolumePatchOP.h for sycl
2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137
Converting fixed float type into template type for TensorContraction.
2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae
Adding Sycl Backend for TensorGenerator.h.
2017-02-22 16:36:24 +00:00
Gael Guennebaud
5c68ba41a8
typos
2017-02-21 17:10:55 +01:00
Gael Guennebaud
b0f55ef85a
merge
2017-02-21 17:04:10 +01:00
Gael Guennebaud
d29e9d7119
Improve documentation of reshaped
2017-02-21 17:03:10 +01:00
Gael Guennebaud
9b6e365018
Fix linking issue.
2017-02-21 16:52:22 +01:00
Gael Guennebaud
3d200257d7
Add support for automatic-size deduction in reshaped, e.g.:
...
mat.reshaped(4,AutoSize); <-> mat.reshaped(4,mat.size()/4);
2017-02-21 15:57:25 +01:00
Gael Guennebaud
f8179385bd
Add missing const version of mat(all).
2017-02-21 13:56:26 +01:00
Gael Guennebaud
1e3aa470fa
Fix long to int conversion
2017-02-21 13:56:01 +01:00
Gael Guennebaud
b3fc0007ae
Add support for mat(all) as an alias to mat.reshaped(mat.size(),fix<1>);
2017-02-21 13:49:09 +01:00
Mehdi Goli
4f07ac16b0
Reducing the number of warnings.
2017-02-21 10:09:47 +00:00
Gael Guennebaud
76687f385c
bug #1394 : fix compilation of SelfAdjointEigenSolver<Matrix>(sparse*sparse);
2017-02-20 14:27:26 +01:00
Gael Guennebaud
d8b1f6cebd
bug #1380 : for Map<> as input of matrix exponential
2017-02-20 14:06:06 +01:00
Gael Guennebaud
6572825703
bug #1395 : fix the use of compile-time vectors as inputs of JacobiSVD.
2017-02-20 13:44:37 +01:00
Mehdi Goli
79ebc8f761
Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h.
2017-02-20 12:11:05 +00:00
Gael Guennebaud
9081c8f6ea
Add support for RowOrder reshaped
2017-02-20 11:46:21 +01:00
Gael Guennebaud
a811a04696
Silent warning.
2017-02-20 10:14:21 +01:00
Gael Guennebaud
63798df038
Fix usage of CUDACC_VER
2017-02-20 08:16:36 +01:00
Gael Guennebaud
deefa54a54
Fix tracking of temporaries in unit tests
2017-02-19 10:32:54 +01:00
Gael Guennebaud
f8a55cc062
Fix compilation.
2017-02-18 10:08:13 +01:00
Gael Guennebaud
cbbf88c4d7
Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON.
2017-02-17 14:39:02 +01:00
Gael Guennebaud
582b5e39bf
bug #1393 : enable Matrix/Array explicit ctor from types with conversion operators (was ok with 3.2)
2017-02-17 14:10:57 +01:00
Benoit Steiner
cfa0568ef7
Size indices are signed.
2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0
Adding TensorLayoutSwapOp for sycl.
2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6
Adding TensorPatch.h for sycl backend.
2017-02-15 10:13:01 +00:00
Benoit Steiner
31a25ab226
Merged eigen/eigen into default
2017-02-14 15:36:21 -08:00
Mehdi Goli
0d153ded29
Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test
2017-02-13 17:25:12 +00:00
Gael Guennebaud
5937c4ae32
Fall back is_integral to std::is_integral in c++11
2017-02-13 17:14:26 +01:00
Gael Guennebaud
7073430946
Fix overflow and make use of long long in c++11 only.
2017-02-13 17:14:04 +01:00
Jonathan Hseu
3453b00a1e
Fix vector indexing with uint64_t
2017-02-11 21:45:32 -08:00
Gael Guennebaud
e7ebe52bfb
bug #1391 : include IO.h before DenseBase to enable its usage in DenseBase plugins.
2017-02-13 09:46:20 +01:00
Gael Guennebaud
b3750990d5
Workaround some gcc 4.7 warnings
2017-02-11 23:24:06 +01:00
Gael Guennebaud
4b22048cea
Fallback Reshaped to MapBase when possible (same storage order and linear access to the nested expression)
2017-02-11 15:32:53 +01:00
Gael Guennebaud
83d6a529c3
Use Eigen::fix<N> to pass compile-time sizes.
2017-02-11 15:31:28 +01:00
Gael Guennebaud
c16ee72b20
bug #1392 : fix #include <Eigen/Sparse> with mpl2-only
2017-02-11 10:35:01 +01:00
Gael Guennebaud
e43016367a
Forgot to include a file in previous commit
2017-02-11 10:34:18 +01:00
Gael Guennebaud
6486d4fc95
Worakound gcc 4.7 issue in c++11.
2017-02-11 10:29:10 +01:00
Gael Guennebaud
4a4a72951f
Fix previous commits: disbale only problematic indexed view methods for old compilers instead of disabling everything.
...
Tested with gcc 4.7 (c++03) and gcc 4.8 (c++03 & c++11)
2017-02-11 10:28:44 +01:00
Benoit Steiner
fad776492f
Merged eigen/eigen into default
2017-02-10 14:27:43 -08:00
Benoit Steiner
1ef30b8090
Fixed bug introduced in previous commit
2017-02-10 13:35:10 -08:00
Benoit Steiner
769208a17f
Pulled latest updates from upstream
2017-02-10 13:11:40 -08:00
Benoit Steiner
8b3cc54c42
Added a new EIGEN_HAS_INDEXED_VIEW define that set to 0 for older compilers that are known to fail to compile the indexed views (I used the define from the indexed_views.cpp test).
...
Only include the indexed view methods when the compiler supports the code.
This makes it possible to use Eigen again in complex code bases such as TensorFlow and older compilers such as gcc 4.8
2017-02-10 13:08:49 -08:00
Gael Guennebaud
a1ff24f96a
Fix prunning in (sparse*sparse).pruned() when the result is nearly dense.
2017-02-10 13:59:32 +01:00
Gael Guennebaud
0256c52359
Include clang in the list of non strict MSVC (just to be sure)
2017-02-10 13:41:52 +01:00
Alexander Neumann
dd58462e63
fixed inlining issue with clang-cl on visual studio
...
(grafted from 7962ac1a58
)
2017-02-08 23:50:38 +01:00
Gael Guennebaud
fc8fd5fd24
Improve multi-threading heuristic for matrix products with a small number of columns.
2017-02-07 17:19:59 +01:00
Mehdi Goli
0ee97b60c2
Adding mean to TensorReductionSycl.h
2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b
Fixing TensorReductionSycl for min and max.
2017-02-06 18:05:23 +00:00
Gael Guennebaud
4254b3eda3
bug #1389 : MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX)
2017-02-03 15:22:35 +01:00
Mehdi Goli
bc128f9f3b
Reducing the warnings in Sycl backend.
2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30
Silenced several compilation warnings
2017-02-01 15:50:58 -08:00
Benoit Steiner
2db75c07a6
fixed the ordering of the template and EIGEN_DEVICE_FUNC keywords in a few more places to get more of the Eigen codebase to compile with nvcc again.
2017-02-01 15:41:29 -08:00
Benoit Steiner
fcd257039b
Replaced EIGEN_DEVICE_FUNC template<foo> with template<foo> EIGEN_DEVICE_FUNC to make the code compile with nvcc8.
2017-02-01 15:30:49 -08:00
Gael Guennebaud
84090027c4
Disable a part of the unit test for gcc 4.8
2017-02-01 23:37:44 +01:00
Gael Guennebaud
0eceea4efd
Define EIGEN_COMP_GNUC to reflect version number: 47, 48, 49, 50, 60, ...
2017-02-01 23:36:40 +01:00
Mehdi Goli
ff53050034
Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests.
2017-02-01 15:36:03 +00:00
Mehdi Goli
bab29936a1
Reducing warnings in Sycl backend.
2017-02-01 15:29:53 +00:00
Gael Guennebaud
645a8e32a5
Fix compilation of JacobiSVD for vectors type
2017-01-31 16:22:54 +01:00
Mehdi Goli
48a20b7d95
Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h
2017-01-31 14:06:36 +00:00
Gael Guennebaud
53026d29d4
bug #478 : fix regression in the eigen decomposition of zero matrices.
2017-01-31 14:22:42 +01:00
Benoit Steiner
fbc39fd02c
Merge latest changes from upstream
2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000
bug #1380 : fix matrix exponential with Map<>
2017-01-30 13:55:27 +01:00
Gael Guennebaud
c86911ac73
bug #1384 : fix evaluation of "sparse/scalar" that used the wrong evaluation path.
2017-01-30 13:38:24 +01:00
Mehdi Goli
82ce92419e
Fixing the buffer type in memcpy.
2017-01-30 11:38:20 +00:00
Gael Guennebaud
24409f3acd
Use fix<> API to specify compile-time reshaped sizes.
2017-01-29 15:20:35 +01:00
Gael Guennebaud
9036cda364
Cleanup intitial reshape implementation:
...
- reshape -> reshaped
- make it compatible with evaluators.
2017-01-29 14:57:45 +01:00
Gael Guennebaud
0e89baa5d8
import yoco xiao's work on reshape
2017-01-29 14:29:31 +01:00
Gael Guennebaud
d024e9942d
MSVC 1900 release is not c++14 compatible enough for us. The 1910 update seems to be fine though.
2017-01-27 22:17:59 +01:00
Gael Guennebaud
83592659ba
merge
2017-01-27 21:59:59 +01:00
Gael Guennebaud
4a351be163
Fix warning
2017-01-27 11:59:35 +01:00
Gael Guennebaud
251ad3e04f
Fix unamed type as template parametre issue.
2017-01-27 11:57:52 +01:00
Rasmus Munk Larsen
edaa0fc5d1
Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
...
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579
Merged in ggael/eigen-flexidexing (pull request PR-294)
...
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
98dfe0c13f
Fix useless ';' warning
2017-01-25 22:55:04 +01:00
Gael Guennebaud
28351073d8
Fix unamed type as template argument (ok in c++11 only)
2017-01-25 22:54:51 +01:00
Gael Guennebaud
607be65a03
Fix duplicates of array_size bewteen unsupported and Core
2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
7d39c6d50a
Merged eigen/eigen into default
2017-01-25 09:22:26 -08:00
Rasmus Munk Larsen
5c9ed4ba0d
Reverse arguments for pmin in AVX.
2017-01-25 09:21:57 -08:00
Gael Guennebaud
850ca961d2
bug #1383 : fix regression in LinSpaced for integers and high<low
2017-01-25 18:13:53 +01:00
Gael Guennebaud
296d24be4d
bug #1381 : fix sparse.diagonal() used as a rvalue.
...
The problem was that is "sparse" is not const, then sparse.diagonal() must have the
LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference,
const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing
zero coefficient. The trick is to return a reference to a local member of
evaluator<SparseMatrix>.
2017-01-25 17:39:01 +01:00
Gael Guennebaud
d06a48959a
bug #1383 : Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0.
2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen
ae3e43a125
Remove extra space.
2017-01-24 16:16:39 -08:00
Benoit Steiner
e96c77668d
Merged in rmlarsen/eigen2 (pull request PR-292)
...
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352
Update copy helper to use fast_memcpy.
2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221
Adds a fast memcpy function to Eigen. This takes advantage of the following:
...
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.
2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux
Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.
The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.
Measured improvements in wall clock time:
Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2 3.48 2.39 +31.3%
BM_memcpy_1T/8 12.3 6.51 +47.0%
BM_memcpy_1T/64 371 383 -3.2%
BM_memcpy_1T/512 66922 66720 +0.3%
BM_memcpy_1T/4k 9892867 6849682 +30.8%
BM_memcpy_1T/5k 14951099 10332856 +30.9%
BM_memcpy_2T/2 3.50 2.46 +29.7%
BM_memcpy_2T/8 12.3 7.66 +37.7%
BM_memcpy_2T/64 371 376 -1.3%
BM_memcpy_2T/512 66652 66788 -0.2%
BM_memcpy_2T/4k 6145012 6117776 +0.4%
BM_memcpy_2T/5k 9181478 9010942 +1.9%
BM_memcpy_4T/2 3.47 2.47 +31.0%
BM_memcpy_4T/8 12.3 6.67 +45.8
BM_memcpy_4T/64 374 376 -0.5%
BM_memcpy_4T/512 67833 68019 -0.3%
BM_memcpy_4T/4k 5057425 5188253 -2.6%
BM_memcpy_4T/5k 7555638 7779468 -3.0%
BM_memcpy_6T/2 3.51 2.50 +28.8%
BM_memcpy_6T/8 12.3 7.61 +38.1%
BM_memcpy_6T/64 373 378 -1.3%
BM_memcpy_6T/512 66871 66774 +0.1%
BM_memcpy_6T/4k 5112975 5233502 -2.4%
BM_memcpy_6T/5k 7614180 7772246 -2.1%
BM_memcpy_8T/2 3.47 2.41 +30.5%
BM_memcpy_8T/8 12.4 10.5 +15.3%
BM_memcpy_8T/64 372 388 -4.3%
BM_memcpy_8T/512 67373 66588 +1.2%
BM_memcpy_8T/4k 5148462 5254897 -2.1%
BM_memcpy_8T/5k 7660989 7799058 -1.8%
BM_memcpy_12T/2 3.50 2.40 +31.4%
BM_memcpy_12T/8 12.4 7.55 +39.1
BM_memcpy_12T/64 374 378 -1.1%
BM_memcpy_12T/512 67132 66683 +0.7%
BM_memcpy_12T/4k 5185125 5292920 -2.1%
BM_memcpy_12T/5k 7717284 7942684 -2.9%
BM_slicingSmallPieces_1T/2 47.3 47.5 +0.4%
BM_slicingSmallPieces_1T/8 53.6 52.3 +2.4%
BM_slicingSmallPieces_1T/64 491 476 +3.1%
BM_slicingSmallPieces_1T/512 21734 18814 +13.4%
BM_slicingSmallPieces_1T/4k 394660 396760 -0.5%
BM_slicingSmallPieces_1T/5k 218722 209244 +4.3%
BM_slicingSmallPieces_2T/2 80.7 79.9 +1.0%
BM_slicingSmallPieces_2T/8 54.2 53.1 +2.0
BM_slicingSmallPieces_2T/64 497 477 +4.0%
BM_slicingSmallPieces_2T/512 21732 18822 +13.4%
BM_slicingSmallPieces_2T/4k 392885 390490 +0.6%
BM_slicingSmallPieces_2T/5k 221988 208678 +6.0%
BM_slicingSmallPieces_4T/2 80.8 80.1 +0.9%
BM_slicingSmallPieces_4T/8 54.1 53.2 +1.7%
BM_slicingSmallPieces_4T/64 493 476 +3.4%
BM_slicingSmallPieces_4T/512 21702 18758 +13.6%
BM_slicingSmallPieces_4T/4k 393962 404023 -2.6%
BM_slicingSmallPieces_4T/5k 249667 211732 +15.2%
BM_slicingSmallPieces_6T/2 80.5 80.1 +0.5%
BM_slicingSmallPieces_6T/8 54.4 53.4 +1.8%
BM_slicingSmallPieces_6T/64 488 478 +2.0%
BM_slicingSmallPieces_6T/512 21719 18841 +13.3%
BM_slicingSmallPieces_6T/4k 394950 397583 -0.7%
BM_slicingSmallPieces_6T/5k 223080 210148 +5.8%
BM_slicingSmallPieces_8T/2 81.2 80.4 +1.0%
BM_slicingSmallPieces_8T/8 58.1 53.5 +7.9%
BM_slicingSmallPieces_8T/64 489 480 +1.8%
BM_slicingSmallPieces_8T/512 21586 18798 +12.9%
BM_slicingSmallPieces_8T/4k 394592 400165 -1.4%
BM_slicingSmallPieces_8T/5k 219688 208301 +5.2%
BM_slicingSmallPieces_12T/2 80.2 79.8 +0.7%
BM_slicingSmallPieces_12T/8 54.4 53.4 +1.8
BM_slicingSmallPieces_12T/64 488 476 +2.5%
BM_slicingSmallPieces_12T/512 21931 18831 +14.1%
BM_slicingSmallPieces_12T/4k 393962 396541 -0.7%
BM_slicingSmallPieces_12T/5k 218803 207965 +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440
Fix NaN propagation for AVX512.
2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4
Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
...
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2
Add support for std::integral_constant
2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854
Add test for multiple symbols
2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13
Fix seq().reverse() in c++98
2017-01-24 11:36:43 +01:00
Gael Guennebaud
5783158e8f
Add unit test for FixedInt and Symbolic
2017-01-24 10:55:12 +01:00
Gael Guennebaud
ddd83f82d8
Add support for "SymbolicExpr op fix<N>" in C++98/11 mode.
2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a
Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|)
2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62
Add internal doc
2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab
Rename fix_t to FixedInt
2017-01-24 09:39:49 +01:00
Gael Guennebaud
156e6234f1
bug #1375 : fix cmake installation with cmake 2.8
2017-01-24 09:16:40 +01:00
Gael Guennebaud
ba3f977946
bug #1376 : add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j))
2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36
bug #1382 : move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!)
2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a
Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t
2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692
Use Index instead of size_t
2017-01-23 22:00:33 +01:00
Luke Iwanski
bf44fed9b7
Allows AMD APU
2017-01-23 15:56:45 +00:00
Gael Guennebaud
0fe278f7be
bug #1379 : fix compilation in sparse*diagonal*dense with openmp
2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e
bug #1378 : fix doc (DiagonalIndex vs Diagonal)
2017-01-21 22:09:59 +01:00
Mehdi Goli
602f8c27f5
Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow.
2017-01-20 18:23:20 +00:00
Gael Guennebaud
4d302a080c
Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only)
2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24
Exploit fixed values in seq and reverse with C++98 compatibility
2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34
Add support for fixed-value in symbolic expression, c++11 only for now.
2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8
Made sure that enabling avx2 instructions enables avx and sse instructions as well.
2017-01-19 09:54:48 -08:00
Mehdi Goli
77cc4d06c7
Removing unused variables
2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2
Merging with Benoit's upstream.
2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572
Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class.
2017-01-19 11:30:59 +00:00
Benoit Steiner
aa7fb88dfa
Merged in LaFeuille/eigen (pull request PR-289)
...
Fix a typo
2017-01-18 16:44:39 -08:00
Gael Guennebaud
e84ed7b6ef
Remove dead code
2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419
Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end.
2017-01-18 23:16:32 +01:00
Mehdi Goli
c6f7b33834
Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy.
2017-01-18 10:45:28 +00:00
Gael Guennebaud
15471432fe
Add a .reverse() member to ArithmeticSequence.
2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a
Add missing operator*
2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b
Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n)
2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353
Merge the generic and dynamic overloads of block()
2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8
Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
...
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f
Fix regression when passing enums to operator()
2017-01-17 17:10:16 +01:00
Gael Guennebaud
f7852c3d16
Fix -Wunnamed-type-template-args
2017-01-17 16:05:58 +01:00
Gael Guennebaud
4f36dcfda8
Add a generic block() method compatible with Eigen::fix
2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356
Add a get_runtime_value helper to deal with pointer-to-function hack,
...
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
59801a3250
Add \newin{3.x} doxygen command
2017-01-17 10:31:28 +01:00
Gael Guennebaud
23bfcfc15f
Add missing overload of get_compile_time for c++98/11
2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2
Disambiguate the two versions of fix for doxygen
2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2
Add support for symbolic expressions as arguments of operator()
2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844
typos in doc
2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa
Typo
2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845
Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
...
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161
Introduce a EIGEN_HAS_CXX14 macro
2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381
Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree.
2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8
Reverting unintentional change to Eigen/Geometry
2017-01-16 11:05:56 +00:00
LaFeuille
1b19b80c06
Fix a typo
2017-01-13 07:24:55 +00:00
Fraser Cormack
8245d3c7ad
Fix case-sensitivity of file include
2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5
Large code refactoring:
...
- generalize some utilities and move them to Meta (size(), array_size())
- move handling of all and single indices to IndexedViewHelper.h
- several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0
Make get_compile_time compatible with variable_if_dynamic
2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6
Make variable_if_dynamic<T> implicitely convertible to T
2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee
merge
2017-01-11 14:33:37 +01:00
Gael Guennebaud
152cd57bb7
Enable generation of doc for static variables in Eigen's namespace.
2017-01-11 14:29:20 +01:00
Gael Guennebaud
b1dc0fa813
Move fix and symbolic to their own file, and improve doxygen compatibility
2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2
Add 1D overloads of operator()
2017-01-11 13:17:09 +01:00
Gael Guennebaud
45199b9773
Fix typo
2017-01-11 09:34:08 +01:00
Gael Guennebaud
1b5570988b
Add doc to seq, seqN, ArithmeticSequence, operator(), etc.
2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446
Factorize const and non-const version of the generic operator() method.
2017-01-10 21:45:55 +01:00
Gael Guennebaud
d072fc4b14
add writeable IndexedView
2017-01-10 17:10:35 +01:00
Gael Guennebaud
c9d5e5c6da
Simplify Symbolic API: std::tuple is now used internally and automatically built.
2017-01-10 16:55:07 +01:00
Gael Guennebaud
407e7b7a93
Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol.
2017-01-10 16:45:32 +01:00
Gael Guennebaud
96e6cf9aa2
Fix linking issue.
2017-01-10 16:35:46 +01:00
Gael Guennebaud
e63678bc89
Fix ambiguous call
2017-01-10 16:33:40 +01:00
Gael Guennebaud
8e247744a4
Fix linking issue
2017-01-10 16:32:06 +01:00
Gael Guennebaud
b47a7e5c3a
Add doc for IndexedView
2017-01-10 16:28:57 +01:00
Gael Guennebaud
87963f441c
Fallback to Block<> when possible (Index, all, seq with > increment).
...
This is important to take advantage of the optimized implementations (evaluator, products, etc.),
and to support sparse matrices.
2017-01-10 14:25:30 +01:00
Gael Guennebaud
a98c7efb16
Add a more generic evaluation mechanism and minimalistic doc.
2017-01-10 11:46:29 +01:00
Gael Guennebaud
13d954f270
Cleanup Eigen's namespace
2017-01-10 11:06:02 +01:00
Gael Guennebaud
9eaab4f9e0
Refactoring: move all symbolic stuff into its own namespace
2017-01-10 10:57:08 +01:00
Gael Guennebaud
acd08900c9
Move 'last' and 'end' to their own namespace
2017-01-10 10:31:07 +01:00
Gael Guennebaud
1df2377d78
Implement c++98 version of seq()
2017-01-10 10:28:45 +01:00
Gael Guennebaud
ecd9cc5412
Isolate legacy code (we keep it for performance comparison purpose)
2017-01-10 09:34:25 +01:00
Gael Guennebaud
b50c3e967e
Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN.
2017-01-09 23:42:16 +01:00
Gael Guennebaud
68064e14fa
Rename span/range to seqN/seq
2017-01-09 17:35:21 +01:00
Gael Guennebaud
ad3eef7608
Add link to SO
2017-01-09 13:01:39 +01:00
Gael Guennebaud
75aef5b37f
Fix extraction of compile-time size of std::array with gcc
2017-01-06 22:04:49 +01:00
Gael Guennebaud
233dff1b35
Add support for plain arrays for columns and both rows/columns
2017-01-06 22:01:53 +01:00
Gael Guennebaud
76e183bd52
Propagate compile-time size for plain arrays
2017-01-06 22:01:23 +01:00
Gael Guennebaud
3264d3c761
Add support for plain-array as indices, e.g., mat({1,2,3,4})
2017-01-06 21:53:32 +01:00
Gael Guennebaud
831fffe874
Add missing doc of SparseView
2017-01-06 18:01:29 +01:00
Gael Guennebaud
a875167d99
Propagate compile-time increment and strides.
...
Had to introduce a UndefinedIncr constant for non structured list of indices.
2017-01-06 15:54:55 +01:00
Gael Guennebaud
e383d6159a
MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd
2017-01-06 15:44:13 +01:00
Gael Guennebaud
fad1fa75b3
Propagate compile-time size with "all" and add c++11 array unit test
2017-01-06 13:29:33 +01:00
Gael Guennebaud
3730e3ca9e
Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup.
2017-01-06 13:10:10 +01:00
Gael Guennebaud
60e99ad8d7
Add unit test for indexed views
2017-01-06 11:59:08 +01:00
Gael Guennebaud
ac7e4ac9c0
Initial commit to add a generic indexed-based view of matrices.
...
This version already works as a read-only expression.
Numerous refactoring, renaming, extension, tuning passes are expected...
2017-01-06 00:01:44 +01:00
Gael Guennebaud
f3f026c9aa
Convert integers to real numbers when computing relative L2 error
2017-01-05 13:36:08 +01:00
Jim Radford
0c226644d8
LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc.
2017-01-04 14:42:57 -08:00
Jim Radford
be281e5289
LLT: avoid making a copy when decomposing in place
2017-01-04 14:43:56 -08:00
Gael Guennebaud
e27f17bf5c
Gub 1453: fix Map with non-default inner-stride but no outer-stride.
2017-08-22 13:27:37 +02:00
Gael Guennebaud
21d0a0bcf5
bug #1456 : add perf recommendation for LLT and storage format
2017-08-22 12:46:35 +02:00
Gael Guennebaud
2c3d70d915
Re-enable hidden doc in LLT
2017-08-22 12:04:09 +02:00
Gael Guennebaud
a6e7a41a55
bug #1455 : Cholesky module depends on Jacobi for rank-updates.
2017-08-22 11:37:32 +02:00
Gael Guennebaud
e6021cc8cc
bug #1458 : fix documentation of LLT and LDLT info() method.
2017-08-22 11:32:55 +02:00
Gael Guennebaud
2810ba194b
Clarify MKL_DIRECT_CALL doc.
2017-08-17 22:12:26 +02:00
Gael Guennebaud
f727844658
use MKL's lapacke.h header when using MKL
2017-08-17 21:58:39 +02:00
Gael Guennebaud
8c858bd891
Clarify doc regarding the usage of MKL_DIRECT_CALL
2017-08-17 12:17:45 +02:00
Gael Guennebaud
b95f92843c
Fix support for MKL's BLAS when using MKL_DIRECT_CALL.
2017-08-17 12:07:10 +02:00
Gael Guennebaud
89c01a494a
Add unit test for has_ReturnType
2017-08-17 11:55:00 +02:00
Gael Guennebaud
687bedfcad
Make NoAlias and JacobiRotation compatible with CUDA.
2017-08-17 11:51:22 +02:00
Gael Guennebaud
1f4b24d2df
Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector
2017-07-20 10:13:48 +02:00
Gael Guennebaud
d580a90c9a
Disable BDCSVD preallocation check.
2017-07-20 10:03:54 +02:00
Gael Guennebaud
55d7181557
Fix lazyness of operator* with CUDA
2017-07-20 09:47:28 +02:00
Gael Guennebaud
cda47c42c2
Fix compilation in c++98 mode.
2017-07-17 21:08:20 +02:00
Gael Guennebaud
a74b9ba7cd
Update documentation for CUDA
2017-07-17 11:05:26 +02:00
Gael Guennebaud
3182bdbae6
Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined
2017-07-17 11:01:28 +02:00
Gael Guennebaud
9f8136ff74
disable nvcc boolean-expr-is-constant warning
2017-07-17 10:43:18 +02:00
Gael Guennebaud
bbd97b4095
Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases
2017-07-17 01:02:51 +02:00
Gael Guennebaud
2299717fd5
Fix and workaround several doxygen issues/warnings
2017-01-04 23:27:33 +01:00
Luke Iwanski
90c5bc8d64
Fixes auto appearance in functor template argument for reduction.
2017-01-04 22:18:44 +00:00
Gael Guennebaud
ee6f7f6c0c
Add doc for sparse triangular solve functions
2017-01-04 23:10:36 +01:00
Gael Guennebaud
5165de97a4
Add missing snippet files.
2017-01-04 23:08:27 +01:00
Gael Guennebaud
a0a36ad0ef
bug #1336 : workaround doxygen failing to include numerous members of MatriBase in Matrix
2017-01-04 22:02:39 +01:00
Gael Guennebaud
29a1a58113
Document selfadjointView
2017-01-04 22:01:50 +01:00
Gael Guennebaud
a5ebc92f8d
bug #1336 : fix doxygen issue regarding EIGEN_CWISE_BINARY_RETURN_TYPE
2017-01-04 18:21:44 +01:00
Gael Guennebaud
45b289505c
Add debug output
2017-01-03 11:31:02 +01:00
Gael Guennebaud
5838f078a7
Fix inclusion
2017-01-03 11:30:27 +01:00
Gael Guennebaud
8702562177
bug #1370 : add doc for StorageIndex
2017-01-03 11:25:41 +01:00
Gael Guennebaud
575c078759
bug #1370 : rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index
2017-01-03 11:19:14 +01:00
NeroBurner
c4fc2611ba
add cmake-option to enable/disable creation of tests
...
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
2017-01-02 09:09:21 +01:00
Valentin Roussellet
d3c5525c23
Added += and + operators to inner iterators
...
Fix #1340
#1340
2016-12-28 18:29:30 +01:00
Gael Guennebaud
5c27962453
Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class.
2017-01-02 22:27:07 +01:00
Marco Falke
4ebf69394d
doc: Fix trivial typo in AsciiQuickReference.txt
...
* * *
fixup!
2017-01-01 13:25:48 +00:00
Gael Guennebaud
8d7810a476
bug #1365 : fix another type mismatch warning
...
(sync is set from and compared to an Index)
2016-12-28 23:35:43 +01:00
Gael Guennebaud
97812ff0d3
bug #1369 : fix type mismatch warning.
...
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
2016-12-28 23:29:35 +01:00
Gael Guennebaud
7713e20fd2
Fix compilation
2016-12-27 22:04:58 +01:00
Gael Guennebaud
ab69a7f6d1
Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order
2016-12-27 16:55:47 +01:00
Gael Guennebaud
d32a43e33a
Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly.
2016-12-27 16:35:45 +01:00
Gael Guennebaud
7136267461
Add missing .outer() member to iterators of evaluators of cwise sparse binary expression
2016-12-27 16:34:30 +01:00
Gael Guennebaud
fe0ee72390
Fix check of storage order mismatch for "sparse cwiseop sparse".
2016-12-27 16:33:19 +01:00
Gael Guennebaud
6b8f637ab1
Harmless typo
2016-12-27 16:31:17 +01:00
Benoit Steiner
3eda02d78d
Fixed the sycl benchmarking code
2016-12-22 10:37:05 -08:00
Mehdi Goli
8b1c2108ba
Reverting asynchronous exec to Synchronous exec regarding random race condition.
2016-12-22 16:45:38 +00:00
Benoit Steiner
354baa0fb1
Avoid using horizontal adds since they're not very efficient.
2016-12-21 20:55:07 -08:00
Benoit Steiner
d7825b6707
Use native AVX512 types instead of Eigen Packets whenever possible.
2016-12-21 20:06:18 -08:00
Benoit Steiner
660da83e18
Pulled latest update from trunk
2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10
Simplified the contraction code`
2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d
Merged in benoitsteiner/opencl (pull request PR-279)
...
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350
Added support for libxsmm kernel in multithreaded contractions
2016-12-21 15:06:06 -08:00
Benoit Steiner
0657228569
Simplified the way we link libxsmm
2016-12-21 14:40:08 -08:00
Benoit Steiner
bbca405f04
Pulled latest updates from trunk
2016-12-21 13:45:28 -08:00
Benoit Steiner
b91be60220
Automatically include and link libxsmm when present.
2016-12-21 13:44:59 -08:00
Gael Guennebaud
c6882a72ed
Merged in joaoruileal/eigen (pull request PR-276)
...
Minor improvements to Umfpack support
2016-12-21 21:39:48 +01:00
Benoit Steiner
f9eff17e91
Leverage libxsmm kernels within signle threaded contractions
2016-12-21 12:32:06 -08:00
Benoit Steiner
c19fe5e9ed
Added support for libxsmm in the eigen makefiles
2016-12-21 10:43:40 -08:00
Benoit Steiner
a34d4ebd74
Merged in benoitsteiner/opencl (pull request PR-278)
2016-12-21 08:24:17 -08:00
Luke Iwanski
c55ecfd820
Fix for auto appearing in functor template argument.
2016-12-21 15:42:51 +00:00
Joao Rui Leal
c8c89b5e19
renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus()
2016-12-21 09:16:28 +00:00
Benoit Steiner
0f577d4744
Merged eigen/eigen into default
2016-12-20 17:02:06 -08:00
Gael Guennebaud
f2f9df8aa5
Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround.
2016-12-20 22:53:19 +01:00
Gael Guennebaud
2b3fc981b8
bug #1362 : workaround constant conditional warning produced by MSVC
2016-12-20 22:52:27 +01:00
Luke Iwanski
29186f766f
Fixed order of initialisation in ExecExprFunctorKernel functor.
2016-12-20 21:32:42 +00:00
Gael Guennebaud
94e8d8902f
Fix bug #1367 : compilation fix for gcc 4.1!
2016-12-20 22:17:01 +01:00
Gael Guennebaud
e8d6862f14
Properly adjust precision when saving to Market format.
2016-12-20 22:10:33 +01:00
Gael Guennebaud
e2f4ee1c2b
Speed up parsing of sparse Market file.
2016-12-20 21:56:21 +01:00
Luke Iwanski
8245851d1b
Matching parameters order between lambda and the functor.
2016-12-20 16:18:15 +00:00
Gael Guennebaud
684cfc762d
Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code)
2016-12-20 16:33:53 +01:00
Gael Guennebaud
8bd0d3aa34
merge
2016-12-20 15:56:00 +01:00
Gael Guennebaud
11f55b2979
Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty.
...
For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.
2016-12-20 15:55:40 +01:00
Gael Guennebaud
5271474b15
Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization)
...
Note: we should probbaly get rid of this class and define a macro instead.
2016-12-20 15:51:30 +01:00
Christoph Hertzberg
1c024e5585
Added some possible temporaries to .hgignore
2016-12-20 14:45:44 +01:00
Gael Guennebaud
316673bbde
Clean-up usage of ExpressionTraits in all/any implementation.
2016-12-20 14:38:05 +01:00
Benoit Steiner
548ed30a1c
Added an OpenCL regression test
2016-12-19 18:56:26 -08:00
Christoph Hertzberg
10c6bcdc2e
Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module
2016-12-19 14:07:42 +01:00
Gael Guennebaud
f5d644b415
Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case.
2016-12-20 09:35:00 +01:00
Benoit Steiner
27ceb43bf6
Fixed race condition in the tensor_shuffling_sycl test
2016-12-19 15:34:42 -08:00
Benoit Steiner
923acadfac
Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics
2016-12-19 13:02:27 -08:00
Benoit Jacob
751e097c57
Use 32 registers on ARM64
2016-12-19 13:44:46 -05:00
Benoit Steiner
fb1d0138ec
Include SSE packet instructions when compiling with avx512 enabled.
2016-12-19 07:32:48 -08:00
Joao Rui Leal
95b804c0fe
it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack
2016-12-19 10:45:59 +00:00
Gael Guennebaud
8c0e701504
bug #1360 : fix sign issue with pmull on altivec
2016-12-18 22:13:19 +00:00
Gael Guennebaud
fc94258e77
Fix unused warning
2016-12-18 22:11:48 +00:00
Benoit Steiner
0e0d92d34b
Merged in benoitsteiner/opencl (pull request PR-275)
...
Improved support for OpenCL
2016-12-17 10:14:17 -08:00
Benoit Steiner
9e03dfb452
Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code
2016-12-17 09:23:37 -08:00
Benoit Steiner
70d0172f0c
Merged eigen/eigen into default
2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19
Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl.
2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df
struct -> class in order to silence compilation warning.
2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0
Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl.
2016-12-16 19:46:45 +00:00
ermak
d60cca32e5
Transformation methods added to ParametrizedLine class.
2016-12-17 00:45:13 +07:00
Jeff Trull
7949849ebc
refactor common row/column iteration code into its own class
2016-12-08 19:40:15 -08:00
Jeff Trull
d7bc64328b
add display of entries to gdb sparse matrix prettyprinter
2016-12-08 18:50:17 -08:00
Jeff Trull
ff424927bc
Introduce a simple pretty printer for sparse matrices (no contents)
2016-12-08 09:45:27 -08:00
Jeff Trull
5ce5418631
Correct prettyprinter comment - Quaternions are in fact supported
2016-12-08 07:31:16 -08:00
Rafael Guglielmetti
8f11df2667
NumTraits.h:
...
For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost
2016-12-16 09:07:12 +00:00
Gael Guennebaud
7d5303a083
Partly revert changeset 642dddcce2
...
, just in case the x87 issue popup again
2016-12-16 09:25:14 +01:00
Benoit Steiner
2f7c2459b7
Merged in benoitsteiner/opencl (pull request PR-272)
...
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 17:46:40 -08:00
Mehdi Goli
c5e8546306
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 16:59:57 +00:00
Christoph Hertzberg
4247d35d4b
Fixed bug which (extremely rarely) could end in an infinite loop
2016-12-15 17:22:12 +01:00
Christoph Hertzberg
642dddcce2
Fix nonnull-compare warning
2016-12-15 17:16:56 +01:00
Benoit Steiner
1324ffef2f
Reenabled the use of constexpr on OpenCL devices
2016-12-15 06:49:38 -08:00
Gael Guennebaud
5d00fdf0e8
bug #1363 : fix mingw's ABI issue
2016-12-15 11:58:31 +01:00
Benoit Steiner
2c2e218471
Avoid using #define since they can conflict with user code
2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5
Don't call EnvThread::OnCancel by default since it doesn't do anything.
2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821
Merged eigen/eigen into default
2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c
Adding asynchronous execution as it improves the performance.
2016-12-14 17:38:53 +00:00
Gael Guennebaud
11b492e993
bug #1358 : fix compilation for sparse += sparse.selfadjointView();
2016-12-14 17:53:47 +01:00
Gael Guennebaud
e67397bfa7
bug #1359 : fix compilation of col_major_sparse.row() *= scalar
...
(used to work in 3.2.9 though the expression is not really writable)
2016-12-14 17:05:26 +01:00
Gael Guennebaud
98d7458275
bug #1359 : fix sparse /=scalar and *=scalar implementation.
...
InnerIterators must be obtained from an evaluator.
2016-12-14 17:03:13 +01:00
Mehdi Goli
2d4a091beb
Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h
2016-12-14 15:30:37 +00:00
Gael Guennebaud
c817ce3ba3
bug #1361 : fix compilation issue in mat=perm.inverse()
2016-12-13 23:10:27 +01:00
Benoit Steiner
a432fc102d
Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool
2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed
Made ThreadPoolInterface::Cancel() an optional functionality
2016-12-12 11:58:38 -08:00
Gael Guennebaud
57acb05eef
Update and extend doc on alignment issues.
2016-12-11 22:45:32 +01:00
Benoit Steiner
76fca22134
Use a more accurate timer to sleep on Linux systems.
2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7
Introduce a portable EIGEN_SLEEP macro.
2016-12-09 14:52:15 -08:00
Benoit Steiner
aafa97f4d2
Fixed build error with MSVC
2016-12-09 14:42:32 -08:00
Benoit Steiner
2f5b7a199b
Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms.
2016-12-09 13:05:14 -08:00
Benoit Steiner
3d59a47720
Added a message to ease the detection of platforms on which thread cancellation isn't supported.
2016-12-08 14:51:46 -08:00
Benoit Steiner
28ee8f42b2
Added a Flush method to the RunQueue
2016-12-08 14:07:56 -08:00
Benoit Steiner
69ef267a77
Added the new threadpool cancel method to the threadpool interface based class.
2016-12-08 14:03:25 -08:00
Benoit Steiner
7bfff85355
Added support for thread cancellation on Linux
2016-12-08 08:12:49 -08:00
Benoit Steiner
6811e6cf49
Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268)
...
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-08 05:14:11 -08:00
Gael Guennebaud
747202d338
typo
2016-12-08 12:48:15 +01:00
Gael Guennebaud
bb297abb9e
make sure we use the right eigen version
2016-12-08 12:00:11 +01:00
Gael Guennebaud
8b4b00d277
fix usage of custom compiler
2016-12-08 11:59:39 +01:00
Gael Guennebaud
7105596899
Add missing include and use -O3
2016-12-07 16:56:08 +01:00
Gael Guennebaud
780f3c1adf
Fix call to convert on linux
2016-12-07 16:30:11 +01:00
Gael Guennebaud
3855ab472f
Cleanup file structure
2016-12-07 14:23:49 +01:00
Gael Guennebaud
59a59fa8e7
Update perf monitoring scripts to generate html/svg outputs
2016-12-07 13:36:56 +01:00
Angelos Mantzaflaris
7694684992
Remove superfluous const's (can cause warnings on some Intel compilers)
...
(grafted from e236d3443c
)
2016-12-07 00:37:48 +01:00
Gael Guennebaud
f2c506b03d
Add a script example to run and upload performance tests
2016-12-06 16:46:52 +01:00
Gael Guennebaud
1b4e085a7f
generate png file for web upload
2016-12-06 16:46:22 +01:00
Gael Guennebaud
f725f1cebc
Mention the CMAKE_PREFIX_PATH variable.
2016-12-06 15:23:45 +01:00
Gael Guennebaud
f90c4aebc5
Update monitored changeset lists
2016-12-06 15:07:46 +01:00
Gael Guennebaud
eb621413c1
Revert vec/y to vec*(1/y) in row-major TRSM:
...
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941
Fix BLAS backend for symmetric rank K updates.
2016-12-06 14:47:09 +01:00
Gael Guennebaud
0c4d05b009
Explain how to choose your favorite Eigen version
2016-12-06 11:34:06 +01:00
Silvio Traversaro
e049a2a72a
Added relocatable cmake support also for CMake before 3.0 and after 2.8.8
2016-12-06 10:37:34 +01:00
Srinivas Vasudevan
e6c8b5500c
Change comparisons to use Scalar instead of RealScalar.
2016-12-05 14:01:45 -08:00
Srinivas Vasudevan
f7d7c33a28
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-05 12:19:01 -08:00
Silvio Traversaro
18481b518f
Make CMake config file relocatable
2016-12-05 10:39:52 +01:00
Gael Guennebaud
c68c8631e7
fix compilation of BTL's blaze interface
2016-12-05 23:02:16 +01:00
Gael Guennebaud
1ff1d4a124
Add performance monitoring for LLT
2016-12-05 23:01:52 +01:00
Srinivas Vasudevan
09ee7f0c80
Fix small nit where I changed name of plog1p to pexpm1.
2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f
Sync from Head.
2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f
Added support for expm1 in Eigen.
2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36
Ease compiler job to generate clean and efficient code in mat*vec.
2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec
Operators += and -= do not resize!
2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris
18de92329e
use numext::abs
...
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e
1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
...
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291
Fix memory leak in Ref<Sparse>
2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65
Optimize SparseLU::solve for rhs vectors
2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903
remove temporary in SparseLU::solve
2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4
bug #1356 : fix calls to evaluator::coeffRef(0,0) to get the address of the destination
...
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86
typo
2016-12-05 13:51:07 +01:00
Gael Guennebaud
445c015751
extend monitoring benchmarks with transpose matrix-vector and triangular matrix-vectors.
2016-12-05 13:36:26 +01:00
Gael Guennebaud
e3f613cbd4
Improve performance of row-major-dense-matrix * vector products for recent CPUs.
...
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354
Clean debugging code
2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a
Merged in srvasude/eigen (pull request PR-265)
...
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403
Add missing generic load methods.
2016-12-03 21:25:04 +01:00
Gael Guennebaud
6a5fe86098
Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
...
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Benoit Steiner
2bfece5cd1
Merged eigen/eigen into default
2016-12-02 16:30:14 -08:00
Mehdi Goli
592acc5bfa
Makingt default numeric_list works with sycl.
2016-12-02 17:58:30 +00:00
Gael Guennebaud
8dfb3e00b8
merge
2016-12-02 11:34:21 +01:00
Gael Guennebaud
4c0d5f3c01
Add perf monitoring for gemv
2016-12-02 11:34:12 +01:00
Gael Guennebaud
d2718d662c
Re-enable A^T*A action in BTL
2016-12-02 11:32:03 +01:00
Christoph Hertzberg
22f7d398e2
bug #1355 : Fixed wrong line-endings on two files
2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4
Clean up SparseCore module regarding ReverseInnerIterator
2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09
typo UIntPtr
...
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655
fix two warnings(unused typedef, unused variable) and a typo
...
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb
fix member order
2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae
Merged in rmlarsen/eigen (pull request PR-256)
...
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Gael Guennebaud
f95e3b84a5
merge
2016-12-01 16:18:57 +01:00
Benoit Steiner
7ff26ddcbb
Merged eigen/eigen into default
2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d
Fix misleading-indentation warnings.
2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e
Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code.
2016-12-01 13:02:27 +00:00
Benoit Steiner
a70393fd02
Cleaned up forward declarations
2016-11-30 21:59:07 -08:00
Benoit Steiner
e073de96dc
Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again
2016-11-30 21:36:52 -08:00
Benoit Steiner
fca27350eb
Added the deallocate_all() method back
2016-11-30 20:45:20 -08:00
Benoit Steiner
e633a8371f
Simplified includes
2016-11-30 20:21:18 -08:00
Benoit Steiner
7cd33df4ce
Improved formatting
2016-11-30 20:20:44 -08:00
Benoit Steiner
fd1dc3363e
Merged eigen/eigen into default
2016-11-30 20:16:17 -08:00
Benoit Steiner
f5107010ee
Udated the Sizes class to work on AMD gpus without requiring a separate implementation
2016-11-30 19:57:28 -08:00
Benoit Steiner
e37c2c52d3
Added an implementation of numeric_list that works with sycl
2016-11-30 19:55:15 -08:00
Gael Guennebaud
8df272af88
Fix slection of product implementation for dynamic size matrices with fixed max size.
2016-11-30 22:21:33 +01:00
Benoit Steiner
faa2ff99c6
Pulled latest update from trunk
2016-11-30 09:31:24 -08:00
Benoit Steiner
df3da0780d
Updated customIndices2Array to handle various index sizes.
2016-11-30 09:30:12 -08:00
Gael Guennebaud
c927af60ed
Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times.
2016-11-30 17:59:13 +01:00
Luke Iwanski
26fff1c5b1
Added EIGEN_STRONG_INLINE to get_sycl_supported_device().
2016-11-30 16:55:22 +00:00
Gael Guennebaud
ab4ef5e66e
bug #1351 : fix compilation of random with old compilers
2016-11-30 17:37:53 +01:00
Sergiu Deitsch
5e3c5c42f6
cmake: remove architecture dependency from Eigen3ConfigVersion.cmake
...
Also, install Eigen3*.cmake under $prefix/share/eigen3/cmake by default.
(grafted from 86ab00cdcf
)
2016-11-30 15:46:46 +01:00
Sergiu Deitsch
3440b46e2f
doc: mention the NO_MODULE option and target availability
...
(grafted from 65f09be8d2
)
2016-11-30 15:41:38 +01:00
Rasmus Munk Larsen
a0329f64fb
Add a default constructor for the "fake" __half class when not using the
...
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Mehdi Goli
577ce78085
Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend.
2016-11-29 15:30:42 +00:00
Benoit Steiner
3011dc94ef
Call internal::array_prod to compute the total size of the tensor.
2016-11-28 09:00:31 -08:00
Benoit Steiner
02080e2b67
Merged eigen/eigen into default
2016-11-27 07:27:30 -08:00
Benoit Steiner
9fd081cddc
Fixed compilation warnings
2016-11-26 20:22:25 -08:00
Benoit Steiner
9f8fbd9434
Merged eigen/eigen into default
2016-11-26 11:28:25 -08:00
Benoit Steiner
67b2c41f30
Avoided unnecessary type conversion
2016-11-26 11:27:29 -08:00
Benoit Steiner
7fe704596a
Added missing array_get method for numeric_list
2016-11-26 11:26:07 -08:00
Mehdi Goli
7318daf887
Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h.
2016-11-25 16:19:07 +00:00
Benoit Steiner
7ad37606dd
Fixed the documentation of Scalar Tensors
2016-11-24 12:31:43 -08:00
Benoit Steiner
3be1afca11
Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5
2016-11-23 18:49:51 -08:00
Gael Guennebaud
308961c05e
Fix compilation.
2016-11-23 22:17:52 +01:00
Gael Guennebaud
21d0286d81
bug #1348 : Document EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES,
...
and reflect in the doc that EIGEN_DONT_ALIGN* are deprecated.
2016-11-23 22:15:03 +01:00
Mehdi Goli
b8cc5635d5
Removing unsupported device from test case; cleaning the tensor device sycl.
2016-11-23 16:30:41 +00:00
Gael Guennebaud
7f6333c32b
Merged in tal500/eigen-eulerangles (pull request PR-237)
...
Euler angles
2016-11-23 15:17:38 +00:00
Gael Guennebaud
f12b368417
Extend polynomial solver unit tests to complexes
2016-11-23 16:05:45 +01:00
Gael Guennebaud
56e5ec07c6
Automatically switch between EigenSolver and ComplexEigenSolver, and fix a few Real versus Scalar issues.
2016-11-23 16:05:10 +01:00
Gael Guennebaud
9246587122
Patch from Oleg Shirokobrod to extend polynomial solver to complexes
2016-11-23 15:42:26 +01:00
Gael Guennebaud
e340866c81
Fix compilation with gcc and old ABI version
2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98
Fix compilation issue with MSVC:
...
MSVC always messes up with shadowed template arguments, for instance in:
struct B { typedef float T; }
template<typename T> struct A : B {
T g;
};
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3
Optimize predux<Packet8f> (AVX)
2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856
Disable usage of SSE3 _mm_hadd_ps that is extremely slow.
2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e
Optimize predux<Packet4d> (AVX)
2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940
Disable usage of SSE3 haddpd that is extremely slow.
2016-11-22 16:58:31 +01:00
Sergiu Deitsch
5c516e4e0a
cmake: added Eigen3::Eigen imported target
...
(grafted from a287140f72
)
2016-11-22 12:25:06 +01:00
Gael Guennebaud
6a84246a6a
Fix regression in assigment of sparse block to spasre block.
2016-11-21 21:46:42 +01:00
Benoit Steiner
f11da1d83b
Made the QueueInterface thread safe
2016-11-20 13:17:08 -08:00
Benoit Steiner
ed839c5851
Enable the use of constant expressions with clang >= 3.6
2016-11-20 10:34:49 -08:00
Benoit Steiner
6d781e3e52
Merged eigen/eigen into default
2016-11-20 10:12:54 -08:00
Benoit Steiner
79a07b891b
Fixed a typo
2016-11-20 07:07:41 -08:00
Gael Guennebaud
465ede0f20
Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
...
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474
Fixed merge conflicts
2016-11-19 19:12:59 -08:00
Benoit Steiner
9265ca707e
Made it possible to check the state of a sycl device without synchronization
2016-11-19 10:56:24 -08:00
Benoit Steiner
2d1aec15a7
Added missing include
2016-11-19 08:09:54 -08:00
Luke Iwanski
af67335e0e
Added test for cwiseMin, cwiseMax and operator%.
2016-11-19 13:37:27 +00:00
Benoit Steiner
1bdf1b9ce0
Merged in benoitsteiner/opencl (pull request PR-253)
...
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
a357fe1fb9
Code cleanup
2016-11-18 16:58:09 -08:00
Benoit Steiner
1c6eafb46b
Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host
2016-11-18 16:43:27 -08:00
Benoit Steiner
ca754caa23
Only runs the cxx11_tensor_reduction_sycl on devices that are available.
2016-11-18 16:31:14 -08:00
Benoit Steiner
dc601d79d1
Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices().
2016-11-18 16:26:50 -08:00
Benoit Steiner
8649e16c2a
Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio
2016-11-18 14:18:34 -08:00
Benoit Steiner
110b7f8d9f
Deleted unnecessary semicolons
2016-11-18 14:06:17 -08:00
Benoit Steiner
b5e3285e16
Test broadcasting on OpenCL devices with 64 bit indexing
2016-11-18 13:44:20 -08:00
Gael Guennebaud
164414c563
Merged in ChunW/eigen (pull request PR-252)
...
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Benoit Steiner
37c2c516a6
Cleaned up the sycl device code
2016-11-18 12:38:06 -08:00
Benoit Steiner
7335c49204
Fixed the cxx11_tensor_device_sycl test
2016-11-18 12:37:13 -08:00
Mehdi Goli
15e226d7d3
adding Benoit changes on the TensorDeviceSycl.h
2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5
Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}.
2016-11-18 16:20:42 +00:00
Luke Iwanski
5159675c33
Added isnan, isfinite and isinf for SYCL device. Plus test for that.
2016-11-18 16:01:48 +00:00
Tal Hadad
76b2a3e6e7
Allow to construct EulerAngles from 3D vector directly.
...
Using assignment template struct to distinguish between 3D vector and 3D rotation matrix.
2016-11-18 15:01:06 +02:00
Luke Iwanski
927bd62d2a
Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC()
2016-11-18 11:16:42 +00:00
Gael Guennebaud
8193ffb3d3
bug #1343 : fix compilation regression in mat+=selfadjoint_view.
...
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2
bug #1343 : fix compilation regression in array = matrix_product
2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f
Merged eigen/eigen into default
2016-11-17 22:53:37 -08:00
Benoit Steiner
553f50b246
Added a way to detect errors generated by the opencl device from the host
2016-11-17 21:51:48 -08:00
Benoit Steiner
72a45d32e9
Cleanup
2016-11-17 21:29:15 -08:00
Benoit Steiner
4349fc640e
Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
...
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
a6a3fd0703
Made TensorDeviceCuda.h compile on windows
2016-11-17 16:15:27 -08:00
Chun Wang
0d0948c3b9
Workaround for error in VS2012 with /clr
2016-11-17 17:54:27 -05:00
Benoit Steiner
004344cf54
Avoid calling log(0) or 1/0
2016-11-17 11:56:44 -08:00
Konstantinos Margaritis
a1d5c503fa
replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f
2016-11-17 13:27:45 -05:00
Konstantinos Margaritis
672aa97d4d
implement float/std::complex<float> for ZVector as well, minor fixes to ZVector
2016-11-17 13:27:33 -05:00
Konstantinos Margaritis
8290e21fb5
replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f
2016-11-17 13:21:15 -05:00
Luke Iwanski
7878756dea
Fixed existing test.
2016-11-17 17:46:55 +00:00
Luke Iwanski
c5130dedbe
Specialised basic math functions for SYCL device.
2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256
Enable the use of AVX512 instruction by default
2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c
bump default branch to 3.3.90
2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4
Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs
2016-11-16 09:01:51 -08:00
Benoit Steiner
b5c75351e3
Merged eigen/eigen into default
2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen
32df1b1046
Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics).
2016-11-14 14:18:16 -08:00
Mehdi Goli
05e8c2a1d9
Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl.
2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976
Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing.
2016-11-14 17:51:57 +00:00
Gael Guennebaud
0ee92aa38e
Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products.
2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0
bug #426 : move operator && and || to MatrixBase and SparseMatrixBase.
2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c
Merged in olesalscheider/eigen (pull request PR-248)
...
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba
Fix regression in SparseMatrix::ReverseInnerIterator
2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408
Make sure not to call numext::maxi on expression templates
2016-11-12 12:20:57 +01:00
Mehdi Goli
a5c3f15682
Adding comment to TensorDeviceSycl.h and cleaning the code.
2016-11-11 19:06:34 +00:00
Benoit Steiner
f4722aa479
Merged in benoitsteiner/opencl (pull request PR-247)
2016-11-11 00:01:28 +00:00
Mehdi Goli
3be3963021
Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor.
2016-11-10 19:16:31 +00:00
Mehdi Goli
12387abad5
adding the missing in eigen_assert!
2016-11-10 18:58:08 +00:00
Mehdi Goli
2e704d4257
Adding Memset; optimising MecopyDeviceToHost by removing double copying;
2016-11-10 18:45:12 +00:00
Gael Guennebaud
eeac81b8c0
bump to 3.3.0
2016-11-10 13:55:14 +01:00
Gael Guennebaud
e80bc2ddb0
Fix printing of sparse expressions
2016-11-10 10:35:32 +01:00
Benoit Steiner
75c080b176
Added a test to validate memory transfers between host and sycl device
2016-11-09 06:23:42 -08:00
Benoit Steiner
db3903498d
Merged in benoitsteiner/opencl (pull request PR-246)
...
Improved support for OpenCL
2016-11-08 22:28:44 +00:00
Benoit Steiner
dcc14bee64
Fixed the formatting of the code
2016-11-08 14:24:46 -08:00
Benoit Steiner
b88c1117d4
Fixed the indentation of the cmake file
2016-11-08 14:22:36 -08:00
Luke Iwanski
912cb3d660
#if EIGEN_EXCEPTION -> #ifdef EIGEN_EXCEPTIONS.
2016-11-08 22:01:14 +00:00
Luke Iwanski
1b345b0895
Fix for SYCL queue initialisation.
2016-11-08 21:56:31 +00:00
Luke Iwanski
1b95717358
Use try/catch only when exceptions are enabled.
2016-11-08 21:08:53 +00:00
Mehdi Goli
d57430dd73
Converting all sycl buffers to uninitialised device only buffers; adding memcpyHostToDevice and memcpyDeviceToHost on syclDevice; modifying all examples to obey the new rules; moving sycl queue creating to the device based on Benoit suggestion; removing the sycl specefic condition for returning m_result in TensorReduction.h according to Benoit suggestion.
2016-11-08 17:08:02 +00:00
Gael Guennebaud
73985ead27
Extend unit test to check sparse solvers with a SparseVector as the rhs and result.
2016-11-06 20:29:57 +01:00
Gael Guennebaud
436a111792
Generalize Cholmod support to hanlde any sparse type as the rhs and result of the solve method
2016-11-06 20:29:23 +01:00
Gael Guennebaud
afc55b1885
Generalize IterativeSolverBase::solve to hanlde any sparse type as the results (instead of SparseMatrix only)
2016-11-06 20:28:18 +01:00
Gael Guennebaud
a5c2d8a3cc
Generalize solve_sparse_through_dense_panels to handle SparseVector.
2016-11-06 15:20:58 +01:00
Gael Guennebaud
f8bfe10613
Add missing friend declaration
2016-11-06 15:20:30 +01:00
Gael Guennebaud
fc7180cda8
Add a default ctor to evaluator<SparseVector>.
...
Needed for evaluator<Solve>.
2016-11-06 15:20:00 +01:00
Gael Guennebaud
4d226ab5b5
Enable swapping between SparseMatrix and SparseVector
2016-11-06 15:15:03 +01:00
Benoit Steiner
ad086b03e4
Removed unnecessary statement
2016-11-05 12:43:27 -07:00
Benoit Steiner
dad177be01
Added missing includes
2016-11-05 10:04:42 -07:00
Gael Guennebaud
55b4fd1d40
Extend mpreal unit test to check LLT with complexes.
2016-11-05 11:28:53 +01:00
Gael Guennebaud
a354c3ca59
Fix compilation of LLT with complex<mpreal>.
2016-11-05 11:28:29 +01:00
Benoit Steiner
d46a36cc84
Merged eigen/eigen into default
2016-11-04 18:22:55 -07:00
Mehdi Goli
0ebe3808ca
Removed the sycl include from Eigen/Core and moved it to Unsupported/Eigen/CXX11/Tensor; added TensorReduction for sycl (full reduction and partial reduction); added TensorReduction test case for sycl (full reduction and partial reduction); fixed the tile size on TensorSyclRun.h based on the device max work group size;
2016-11-04 18:18:19 +00:00
Gael Guennebaud
47d1b4a609
Added tag 3.3-rc2 for changeset ba05572dcb
2016-11-04 09:09:18 +01:00
Gael Guennebaud
ba05572dcb
bump to 3.3-rc2
2016-11-04 09:09:06 +01:00
Benoit Steiner
5c3995769c
Improved AVX512 configuration
2016-11-03 04:50:28 -07:00
Benoit Steiner
fbe672d599
Reenable the generation of dynamic blas libraries.
2016-11-03 04:08:43 -07:00
Benoit Steiner
ca0ba0d9a4
Improved AVX512 support
2016-11-03 04:00:49 -07:00
Benoit Steiner
c80587c92b
Merged eigen/eigen into default
2016-11-03 03:55:11 -07:00
Gael Guennebaud
3f1d0cdc22
bug #1337 : improve doc of homogeneous() and hnormalized()
2016-11-03 11:03:08 +01:00
Gael Guennebaud
78e93ac1ad
bug #1330 : Cholmod supports double precision only, so let's trigger a static assertion if the scalar type does not match this requirement.
2016-11-03 10:21:59 +01:00
Benoit Steiner
3e37166d0b
Merged in benoitsteiner/opencl (pull request PR-244)
...
Disable vectorization on device only when compiling for sycl
2016-11-02 22:01:03 +00:00
Benoit Steiner
0585b2965d
Disable vectorization on device only when compiling for sycl
2016-11-02 11:44:27 -07:00
Benoit Steiner
e6e77ed08b
Don't call lgamma_r when compiling for an Apple device, since the function isn't available on MacOS
2016-11-02 09:55:39 -07:00
Benoit Steiner
b238f387b4
Pulled latest updates from trunk
2016-11-02 08:53:13 -07:00
Benoit Steiner
c8db17301e
Special functions require math.h: make sure it is included.
2016-11-02 08:51:52 -07:00
Gael Guennebaud
a07bb428df
bug #1004 : improve accuracy of LinSpaced for abs(low) >> abs(high).
2016-11-02 11:34:38 +01:00
Gael Guennebaud
598de8b193
Add pinsertfirst function and implement pinsertlast for complex on SSE/AVX.
2016-11-02 10:38:13 +01:00
Benoit Steiner
e44519744e
Merged in benoitsteiner/opencl (pull request PR-243)
...
Fixed the ambiguity in callig make_tuple for sycl backend.
2016-11-02 02:56:58 +00:00
Rasmus Munk Larsen
0a6ae41555
Merged eigen/eigen into default
2016-11-01 15:37:00 -07:00
Rasmus Munk Larsen
b730952414
Don't attempts to use lgamma_r for CUDA devices.
...
Fix type in lgamma_impl<double>.
2016-11-01 15:34:19 -07:00
Benoit Steiner
7a0e96b80d
Gate the code that refers to cuda fp16 primitives more thoroughly
2016-11-01 12:08:09 -07:00
Mehdi Goli
51af6ae971
Fixed the ambiguity in callig make_tuple for sycl backend.
2016-10-31 16:35:51 +00:00
Benoit Steiner
0a9ad6fc72
Worked around Visual Studio compilation errors
2016-10-28 07:54:27 -07:00
Benoit Steiner
d5f88e2357
Sharded the tensor_image_patch test to help it run on low power devices
2016-10-27 21:48:21 -07:00
Benoit Steiner
0b4b0f11e8
Fixed a few more compilation warnings
2016-10-28 04:01:01 +00:00
Benoit Steiner
306daa24a3
Fixed a compilation warning
2016-10-28 03:50:31 +00:00
Benoit Steiner
8471cf1996
Fixed compilation warning
2016-10-28 03:46:08 +00:00
Benoit Steiner
b0c5bfdf78
Added missing template parameters
2016-10-28 03:43:41 +00:00
Rasmus Munk Larsen
2ebb314fa7
Use threadsafe versions of lgamma and lgammaf if possible.
2016-10-27 16:17:12 -07:00
Gael Guennebaud
530f20c21a
Workaround MSVC issue.
2016-10-27 21:51:37 +02:00
Gael Guennebaud
c3ce4f9ac0
Merged in enricodetoma/eigen (pull request PR-241)
...
Always enable /bigobj for tests to avoid a compile error in MSVC 2015
2016-10-27 19:21:28 +00:00
Benoit Steiner
7d64e6752c
Pulled latest updates from trunk
2016-10-26 18:48:06 -07:00
Benoit Steiner
0a4c4d40b4
Removed a template parameter for fixed sized tensors
2016-10-26 18:47:37 -07:00
Gael Guennebaud
3ecb343dc3
Fix regression in X = (X*X.transpose())/s with X rectangular by deferring resizing of the destination after the creation of the evaluator of the source expression.
2016-10-26 22:50:41 +02:00
enrico.detoma
6ed571744b
Always enable /bigobj for tests to avoid a compile error in MSVC 2015
2016-10-26 22:48:46 +02:00
Gael Guennebaud
97feea9d39
add a generic EIGEN_HAS_CXX11
2016-10-26 15:53:13 +02:00
Gael Guennebaud
ca6a2a5248
Fix warning with ICC
2016-10-26 14:13:05 +02:00
Benoit Steiner
5f2dd503ff
Replaced tabs with spaces
2016-10-25 20:40:58 -07:00
Benoit Steiner
1644bafe29
Code cleanup
2016-10-25 20:36:14 -07:00
Gael Guennebaud
b15a5dc3f4
Fix ICC warnings
2016-10-25 22:20:24 +02:00
Gael Guennebaud
aad72f3c6d
Add missing inline keywords
2016-10-25 20:20:09 +02:00
Benoit Steiner
3e194a6a73
Fixed a typo
2016-10-25 08:42:15 -07:00
Gael Guennebaud
58146be99b
bug #1004 : one more rewrite of LinSpaced for floating point numbers to guarantee both interpolation and monotonicity.
...
This version simply does low+i*step plus a branch to return high if i==size-1.
Vectorization is accomplished with a branch and the help of pinsertlast.
Some quick benchmark revealed that the overhead is really marginal, even when filling small vectors.
2016-10-25 16:53:09 +02:00
Gael Guennebaud
13fc18d3a2
Add a pinsertlast function replacing the last entry of a packet by a scalar.
...
(useful to vectorize LinSpaced)
2016-10-25 16:48:49 +02:00
Gael Guennebaud
2634f9386c
bug #1333 : fix bad usage of const_cast_derived. Better use .data() for that purpose.
2016-10-24 22:22:35 +02:00
Gael Guennebaud
9e8f07d7b5
Cleanup ArrayWrapper and MatrixWrapper by removing redundant accessors.
2016-10-24 22:16:48 +02:00
Gael Guennebaud
b027d7a8cf
bug #1004 : remove the inaccurate "sequential" path for LinSpaced, mark respective function as deprecated, and enforce strict interpolation of the higher range using a correction term.
...
Now, even with floating point precision, both the 'low' and 'high' bounds are exactly reproduced at i=0 and i=size-1 respectively.
2016-10-24 20:27:21 +02:00
Benoit Steiner
b11aab5fcc
Merged in benoitsteiner/opencl (pull request PR-238)
...
Added support for OpenCL to the Tensor Module
2016-10-24 15:30:45 +00:00
Gael Guennebaud
53c77061f0
bug #698 : rewrite LinSpaced for integer scalar types to avoid overflow and guarantee an even spacing when possible.
...
Otherwise, the "high" bound is implicitly lowered to the largest value allowing for an even distribution.
This changeset also disable vectorization for this integer path.
2016-10-24 15:50:27 +02:00
Gael Guennebaud
e8e56c7642
Add unit test for overflow in LinSpaced
2016-10-24 15:43:51 +02:00
Gael Guennebaud
40f62974b7
bug #1328 : workaround a compilation issue with gcc 4.2
2016-10-20 19:19:37 +02:00
Benoit Steiner
cf20b30d65
Merge latest updates from trunk
2016-10-20 09:42:05 -07:00
Luke Iwanski
03b63e182c
Added SYCL include in Tensor.
2016-10-20 15:32:44 +01:00
Benoit Steiner
d3943cd50c
Fixed a few typos in the ternary tensor expressions types
2016-10-19 12:56:12 -07:00
Tal Hadad
15eca2432a
Euler tests: Tighter precision when no roll exists and clean code.
2016-10-18 23:24:57 +03:00
Tal Hadad
6f4f12d1ed
Add isApprox() and cast() functions.
...
test cases included
2016-10-17 22:23:47 +03:00
Tal Hadad
7402cfd4cc
Add safty for near pole cases and test them better.
2016-10-17 20:42:08 +03:00
Mehdi Goli
8fb162fc85
Fixing the typo regarding missing #if needed for proper handling of exceptions in Eigen/Core.
2016-10-16 12:52:34 +01:00
Tal Hadad
58f5d7d058
Fix calc bug, docs and better testing.
...
Test code changes:
* better coded
* rand and manual numbers
* singularity checking
2016-10-16 14:39:26 +03:00
Mehdi Goli
e36cb91c99
Fixing the code indentation in the TensorReduction.h file.
2016-10-14 18:03:00 +01:00
Luke Iwanski
2e188dd4d4
Merged ComputeCpp to default.
2016-10-14 16:47:40 +01:00
Mehdi Goli
15380f9a87
Applyiing Benoit's comment to return the missing line back in Eigen/Core
2016-10-14 16:39:41 +01:00
Gael Guennebaud
692b30ca95
Fix previous merge.
2016-10-14 17:16:28 +02:00
Gael Guennebaud
050c681bdd
Merged in rmlarsen/eigen2 (pull request PR-232)
...
Improve performance of parallelized matrix multiply for rectangular matrices
2016-10-14 14:51:09 +00:00
Tal Hadad
078a202621
Merge Hongkai Dai correct range calculation, and remove ranges from API.
...
Docs updated.
2016-10-14 16:03:28 +03:00
Luke Iwanski
e742da8b28
Merged ComputeCpp into default.
2016-10-14 13:36:51 +01:00
Mehdi Goli
524fa4c46f
Reducing the code by generalising sycl backend functions/structs.
2016-10-14 12:09:55 +01:00
Hongkai Dai
014d9f1d9b
implement euler angles with the right ranges
2016-10-13 14:45:51 -07:00
Benoit Steiner
737e4152c3
Merged in lukier/eigen (pull request PR-234)
...
Enabling CUDA in Geometry
2016-10-13 18:09:28 +00:00
Benoit Steiner
d0ee2267d6
Relaxed the resizing checks so that they don't fail with gcc >= 5.3
2016-10-13 10:59:46 -07:00
Robert Lukierski
a94791b69a
Fixes for min and abs after Benoit's comments, switched to numext.
2016-10-13 15:00:22 +01:00
Avi Ginsburg
ac63d6891c
Patch to allow VS2015 & CUDA 8.0 to compile with Eigen included. I'm not sure
...
whether to limit the check to this compiler combination
(` || (EIGEN_COMP_MSVC == 1900 && __CUDACC_VER__) `)
or to leave it as it is. I also don't know if this will have any affect on
including Eigen in device code (I'm not in my current project).
2016-10-13 08:47:32 +00:00
Benoit Steiner
7e4a6754b2
Merged eigen/eigen into default
2016-10-12 22:42:33 -07:00
Benoit Steiner
38b6048e14
Deleted redundant implementation of predux
2016-10-12 14:37:56 -07:00
Gael Guennebaud
e74612b9a0
Remove double ;;
2016-10-12 22:49:47 +02:00
Benoit Steiner
78d2926508
Merged eigen/eigen into default
2016-10-12 13:46:29 -07:00
Benoit Steiner
2e2f48e30e
Take advantage of AVX512 instructions whenever possible to speedup the processing of 16 bit floats.
2016-10-12 13:45:39 -07:00
Gael Guennebaud
f939c351cb
Fix SPQR for rectangular matrices
2016-10-12 22:39:33 +02:00
Gael Guennebaud
091d373ee9
Fix outer-stride.
2016-10-12 21:47:52 +02:00
Robert Lukierski
471075f7ad
Fixes min() warnings.
2016-10-12 18:59:05 +01:00
Gael Guennebaud
5c366fe1d7
Merged in rmlarsen/eigen (pull request PR-230)
...
Fix a bug in psqrt for SSE and AVX when EIGEN_FAST_MATH=1
2016-10-12 16:30:51 +00:00
Robert Lukierski
86711497c4
Adding EIGEN_DEVICE_FUNC in the Geometry module.
...
Additional CUDA necessary fixes in the Core (mostly usage of
EIGEN_USING_STD_MATH).
2016-10-12 16:35:17 +01:00
Rasmus Munk Larsen
47150af1c8
Fix copy-paste error: Must use _mm256_cmp_ps for AVX.
2016-10-12 08:34:39 -07:00
Gael Guennebaud
89e315152c
bug #1325 : fix compilation on NEON with clang
2016-10-12 16:55:47 +02:00
Benoit Steiner
7f0599b6eb
Manually define int16_t and uint16_t when compiling with Visual Studio
2016-10-08 22:56:32 -07:00
Benoit Steiner
5727e4d89c
Reenabled the use of variadic templates on tegra x1 provides that the latest version (i.e. JetPack 2.3) is used.
2016-10-08 22:19:03 +00:00
Benoit Steiner
5266ff8966
Cleaned up a regression test
2016-10-08 19:12:44 +00:00
Benoit Steiner
5c68051cd7
Merge the content of the ComputeCpp branch into the default branch
2016-10-07 11:04:16 -07:00
Gael Guennebaud
4860727ac2
Remove static qualifier of free-functions (inline is enough and this helps ICC to find the right overload)
2016-10-07 09:21:12 +02:00
Benoit Steiner
507b661106
Renamed predux_half into predux_downto4
2016-10-06 17:57:04 -07:00
Benoit Steiner
a498ff7df6
Fixed incorrect comment
2016-10-06 15:27:27 -07:00
Benoit Steiner
8ba3c41fcf
Revergted unecessary change
2016-10-06 15:12:15 -07:00
Benoit Steiner
a7473d6d5a
Fixed compilation error with gcc >= 5.3
2016-10-06 14:33:22 -07:00
Benoit Steiner
5e64cea896
Silenced a compilation warning
2016-10-06 14:24:17 -07:00
Benoit Steiner
33fba3f08d
Merged in rryan/eigen/tensorfunctors (pull request PR-233)
...
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 12:29:19 -07:00
RJ Ryan
bfc264abe8
Add a test that GPU complex product reductions match CPU reductions.
2016-10-06 11:10:14 -07:00
RJ Ryan
e2e9cdd169
Fully support complex types in SumReducer and MeanReducer when building for CUDA by using scalar_sum_op and scalar_product_op instead of operator+ and operator*.
2016-10-06 10:49:48 -07:00
Benoit Steiner
d485d12c51
Added missing AVX intrinsics for fp16: in particular, implemented predux which is required by the matrix-vector code.
2016-10-06 10:41:03 -07:00
Rasmus Munk Larsen
48c635e223
Add a simple cost model to prevent Eigen's parallel GEMM from using too many threads when the inner dimension is small.
...
Timing for square matrices is unchanged, but both CPU and Wall time are significantly improved for skinny matrices. The benchmarks below are for multiplying NxK * KxN matrices with test names of the form BM_OuterishProd/N/K.
Improvements in Wall time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 3088 1610 +47.9%
BM_OuterishProd/64/4 3562 2414 +32.2%
BM_OuterishProd/64/32 8861 7815 +11.8%
BM_OuterishProd/128/1 11363 6504 +42.8%
BM_OuterishProd/128/4 11128 9794 +12.0%
BM_OuterishProd/128/64 27691 27396 +1.1%
BM_OuterishProd/256/1 33214 28123 +15.3%
BM_OuterishProd/256/4 34312 36818 -7.3%
BM_OuterishProd/256/128 174866 176398 -0.9%
BM_OuterishProd/512/1 7963684 104224 +98.7%
BM_OuterishProd/512/4 7987913 112867 +98.6%
BM_OuterishProd/512/256 8198378 1306500 +84.1%
BM_OuterishProd/1k/1 7356256 324432 +95.6%
BM_OuterishProd/1k/4 8129616 331621 +95.9%
BM_OuterishProd/1k/512 27265418 7517538 +72.4%
Improvements in CPU time:
Run on [redacted] (12 X 3501 MHz CPUs); 2016-10-05T17:40:02.462497196-07:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_OuterishProd/64/1 6169 1608 +73.9%
BM_OuterishProd/64/4 7117 2412 +66.1%
BM_OuterishProd/64/32 17702 15616 +11.8%
BM_OuterishProd/128/1 45415 6498 +85.7%
BM_OuterishProd/128/4 44459 9786 +78.0%
BM_OuterishProd/128/64 110657 109489 +1.1%
BM_OuterishProd/256/1 265158 28101 +89.4%
BM_OuterishProd/256/4 274234 183885 +32.9%
BM_OuterishProd/256/128 1397160 1408776 -0.8%
BM_OuterishProd/512/1 78947048 520703 +99.3%
BM_OuterishProd/512/4 86955578 1349742 +98.4%
BM_OuterishProd/512/256 74701613 15584661 +79.1%
BM_OuterishProd/1k/1 78352601 3877911 +95.1%
BM_OuterishProd/1k/4 78521643 3966221 +94.9%
BM_OuterishProd/1k/512 258104736 89480530 +65.3%
2016-10-06 10:33:10 -07:00
Benoit Steiner
9f3276981c
Enabling AVX512 should also enable AVX2.
2016-10-06 10:29:48 -07:00
Gael Guennebaud
80b5133789
Fix compilation of qr.inverse() for column and full pivoting variants.
2016-10-06 09:55:50 +02:00
Benoit Steiner
4131074818
Deleted unecessary CMakeLists.txt file
2016-10-05 18:54:35 -07:00
Benoit Steiner
cb5cd69872
Silenced a compilation warning.
2016-10-05 18:50:53 -07:00
Benoit Steiner
78b569f685
Merged latest updates from trunk
2016-10-05 18:48:55 -07:00
Benoit Steiner
9c2b6c049b
Silenced a few compilation warnings
2016-10-05 18:37:31 -07:00
Benoit Steiner
6f3cd529af
Pulled latest updates from trunk
2016-10-05 18:31:43 -07:00
Benoit Steiner
d7f9679a34
Fixed a couple of compilation warnings
2016-10-05 15:00:32 -07:00
Benoit Steiner
ae1385c7e4
Pull the latest updates from trunk
2016-10-05 14:54:36 -07:00
Benoit Steiner
73b0012945
Fixed compilation warnings
2016-10-05 14:24:24 -07:00
Benoit Steiner
c84084c0c0
Fixed compilation warning
2016-10-05 14:15:41 -07:00
Benoit Steiner
4387433acf
Increased the robustness of the reduction tests on fp16
2016-10-05 10:42:41 -07:00
Benoit Steiner
aad20d700d
Increase the tolerance to numerical noise.
2016-10-05 10:39:24 -07:00
Benoit Steiner
8b69d5d730
::rand() returns a signed integer on win32
2016-10-05 08:55:02 -07:00
Benoit Steiner
ed7a220b04
Fixed a typo that impacts windows builds
2016-10-05 08:51:31 -07:00
Benoit Steiner
ceee1c008b
Silenced compilation warning
2016-10-04 18:47:53 -07:00
Benoit Steiner
698ff69450
Properly characterize the CUDA packet primitives for fp16 as device only
2016-10-04 16:53:30 -07:00
Rasmus Munk Larsen
7f67e6dfdb
Update comment for fast sqrt.
2016-10-04 15:09:11 -07:00
Rasmus Munk Larsen
765615609d
Update comment for fast sqrt.
2016-10-04 15:08:41 -07:00
Rasmus Munk Larsen
3ed67cb0bb
Fix a bug in the implementation of Carmack's fast sqrt algorithm in Eigen (enabled by EIGEN_FAST_MATH), which causes the vectorized parts of the computation to return -0.0 instead of NaN for negative arguments.
...
Benchmark speed in Giga-sqrts/s
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
-----------------------------------------
SSE AVX
Fast=1 2.529G 4.380G
Fast=0 1.944G 1.898G
Fast=1 fixed 2.214G 3.739G
This table illustrates the worst case in terms speed impact: It was measured by repeatedly computing the sqrt of an n=4096 float vector that fits in L1 cache. For large vectors the operation becomes memory bound and the differences between the different versions almost negligible.
2016-10-04 14:22:56 -07:00
Benoit Steiner
6af5ac7e27
Cleanup the cuda executor code.
2016-10-04 08:52:13 -07:00
Benoit Steiner
2f6d1607c8
Cleaned up the random number generation code.
2016-10-04 08:38:23 -07:00
Benoit Steiner
881b90e984
Use explicit type casting to generate packets of zeros.
2016-10-04 08:23:38 -07:00
Benoit Steiner
616a7a1912
Improved support for compiling CUDA code with clang as the host compiler
2016-10-03 17:09:33 -07:00
Benoit Steiner
409e887d78
Added support for constand std::complex numbers on GPU
2016-10-03 11:06:24 -07:00
Gael Guennebaud
9d6d0dff8f
bug #1317 : fix performance regression with some Block expressions and clang by helping it to remove dead code.
...
The trick is to get rid of the nested expression in the evaluator by copying only the required information (here, the strides).
2016-10-01 15:37:00 +02:00
Gael Guennebaud
8b84801f7f
bug #1310 : workaround a compilation regression from 3.2 regarding triangular * homogeneous
2016-09-30 22:49:59 +02:00
Benoit Steiner
422530946f
Renamed the SYCL tests to follow the standard naming convention.
2016-09-30 08:22:10 -07:00
Gael Guennebaud
67b4f45836
Fix angle range
2016-09-30 12:46:33 +02:00
Gael Guennebaud
27f3970453
Remove std:: prefix
2016-09-30 12:40:41 +02:00
Gael Guennebaud
3860a0bc8f
bug #1312 : Quaternion to AxisAngle conversion now ensures the angle will be in the range [-pi,pi]. This also increases accuracy when q.w is negative.
2016-09-29 23:23:35 +02:00
Gael Guennebaud
33500050c3
bug #1308 : fix compilation of some small products involving nullary-expressions.
2016-09-29 09:40:44 +02:00
Benoit Steiner
27d7628f16
Updated the list of warnings to reflect the new message ids introduced in cuda 8.0
2016-09-28 17:42:59 -07:00
Benoit Steiner
2bda1b0d93
Updated the tensor sum and mean reducer to enable them to process complex numbers on cuda gpus.
2016-09-28 17:08:41 -07:00
Mehdi Goli
dd602e62c8
Converting alias template to nested struct in order to be compatible with CXX-03
2016-09-27 16:21:19 +01:00
Gael Guennebaud
f3a00dd2b5
Merged in sergiu/eigen (pull request PR-229)
...
Disabled MSVC level 4 warning C4714
2016-09-27 09:28:08 +02:00
Gael Guennebaud
892afb9416
Add debug info.
2016-09-26 23:53:57 +02:00
Gael Guennebaud
779774f98c
bug #1311 : fix alignment logic in some cases of (scalar*small).lazyProduct(small)
2016-09-26 23:53:40 +02:00
Benoit Steiner
6565f8d60f
Made the initialization of a CUDA device thread safe.
2016-09-26 11:00:32 -07:00
Gael Guennebaud
48dfe98abd
bug #1308 : fix compilation of vector * rowvector::nullary.
2016-09-25 14:54:35 +02:00
Sergiu Deitsch
fe29157d02
disabled MSVC level 4 warning C4714
...
The level 4 warning (/W4) warns about functions marked as __forceinline not
inlined, and generates a lot of noise.
2016-09-25 14:25:47 +02:00
Benoit Steiner
f6ac51a054
Made TensorEvalTo compatible with c++0x again.
2016-09-23 16:45:17 -07:00
Benoit Steiner
00d4e65f00
Deleted unused TensorMap data member
2016-09-23 16:44:45 -07:00
Gael Guennebaud
86caba838d
bug #1304 : fix Projective * scaling and Projective *= scaling
2016-09-23 13:41:21 +02:00
Gael Guennebaud
b9f7a17e47
Add missing file.
2016-09-23 10:26:08 +02:00
Benoit Steiner
1301d744f8
Made the gaussian generator usable on GPU
2016-09-22 19:04:44 -07:00
Benoit Steiner
2a69290ddb
Added a specialization of Eigen::numext::real and Eigen::numext::imag for std::complex<T> to be used when compiling a cuda kernel. This is unfortunately necessary to be able to process complex numbers from a CUDA kernel on MacOS.
2016-09-22 15:52:23 -07:00
Gael Guennebaud
3946768916
Added tag 3.3-rc1 for changeset 77e27fbeee
2016-09-22 22:38:36 +02:00
Gael Guennebaud
77e27fbeee
bump to 3.3-rc1
2016-09-22 22:37:39 +02:00
Gael Guennebaud
2ada122bc6
merge
2016-09-22 22:33:18 +02:00
Gael Guennebaud
8f2bdde373
merge
2016-09-22 22:32:55 +02:00
Gael Guennebaud
ba0f844d6b
Backout changeset ce3557ca69
2016-09-22 22:28:51 +02:00
Gael Guennebaud
9bcdc8b756
Add a nullary-functor example performing index-based sub-matrices.
2016-09-22 22:27:54 +02:00
Benoit Steiner
50e3bbfc90
Calls x.imag() instead of imag(x) when x is a complex number since the former
...
is a constexpr while the later isn't. This fixes compilation errors triggered by nvcc on Mac.
2016-09-22 13:17:25 -07:00
Gael Guennebaud
ca3746c6f8
Bypass identity reflectors.
2016-09-22 22:07:13 +02:00
Felix Gruber
8bde7da086
fix documentation of LinSpaced
...
The index of the highest value in a LinSpace is size-1.
2016-09-22 14:50:07 +02:00
Gael Guennebaud
66cbabafed
Add a note regarding gcc bug #72867
2016-09-22 11:18:52 +02:00
Christoph Hertzberg
4b377715d7
Do not manually add absolute path to boost-library.
...
Also set C++ standard for blaze to C++14
2016-09-22 00:10:47 +02:00
Gael Guennebaud
aecc51a3e8
fix typo
2016-09-21 21:53:00 +02:00
Gael Guennebaud
1fc3a21ed0
Disable a failure test if extended double precision is in use (x87)
2016-09-21 20:09:07 +02:00
Gael Guennebaud
9fa2c8650e
Fix alignement of statically allocated temporaries in symv, and trmv.
2016-09-21 17:34:24 +02:00
Gael Guennebaud
ac5377e161
Improve cost estimation of complex division
2016-09-21 17:26:04 +02:00
Gael Guennebaud
5269d11935
Fix compilation if ICC.
2016-09-21 17:08:51 +02:00
Benoit Steiner
26f9907542
Added missing typedefs
2016-09-20 12:58:03 -07:00
RJ Ryan
608b1acd6d
Don't use c++11 features and fix include.
2016-09-20 07:49:05 -07:00
RJ Ryan
b2c6dc48d9
Add CUDA-specific std::complex<T> specializations for scalar_sum_op, scalar_difference_op, scalar_product_op, and scalar_quotient_op.
2016-09-20 07:18:20 -07:00
Benoit Steiner
8a66ca4b10
Pulled latest updates from trunk
2016-09-19 14:13:55 -07:00
Benoit Steiner
59e9edfbf1
Removed EIGEN_DEVICE_FUNC qualifers for the lu(), fullPivLu(), partialPivLu(), and inverse() functions since they aren't ready to run on GPU
2016-09-19 14:13:20 -07:00
Gael Guennebaud
3ada6e4bed
Merged hongkai-dai/eigen/tip into default (bug #1298 )
2016-09-19 22:08:06 +02:00
Benoit Steiner
c3ca9b1e76
Deleted some unecessary and confusing EIGEN_DEVICE_FUNC
2016-09-19 11:33:39 -07:00
Hongkai Dai
5dcc6d301a
remove ternary operator in euler angles
2016-09-19 10:30:30 -07:00
Luke Iwanski
c771df6bc3
Updated the owners of the file.
2016-09-19 14:09:25 +01:00
Luke Iwanski
b91e021172
Merged with default.
2016-09-19 14:03:54 +01:00
Luke Iwanski
cb81975714
Partial OpenCL support via SYCL compatible with ComputeCpp CE.
2016-09-19 12:44:13 +01:00
Gael Guennebaud
bf03820339
Silent warning.
2016-09-17 14:14:01 +02:00
Gael Guennebaud
de05a18fe0
fix compilation with boost::multiprec
2016-09-17 14:13:48 +02:00
Gael Guennebaud
4cc2c73e6a
Fix alignement of statically allocated temporaries in gemv.
2016-09-17 12:52:27 +02:00
Christoph Hertzberg
ce3557ca69
Make makeHouseholder more stable for cases where real(c0) is not very small (but the rest is).
2016-09-16 14:24:47 +02:00
Emil Fresk
6edd2e2851
Made AutoDiffJacobian more intuitive to use and updated for C++11
...
Changes:
* Removed unnecessary types from the Functor by inferring from its types
* Removed inputs() function reference, replaced with .rows()
* Updated the forward constructor to use variadic templates
* Added optional parameters to the Fuctor for passing parameters,
control signals, etc
* Has been tested with fixed size and dynamic matricies
Ammendment by chtz: overload operator() for compatibility with not fully conforming compilers
2016-09-16 14:03:55 +02:00
Gael Guennebaud
4adeababf9
Fix undeflow
2016-09-16 11:46:46 +02:00
Gael Guennebaud
18f6e47815
Fix order of "static inline".
2016-09-16 11:32:54 +02:00
Gael Guennebaud
ee62f168e6
Doc: add link from block methods to respective tutorial section.
2016-09-16 11:26:25 +02:00
Gael Guennebaud
ca7f061a5f
bug #828 : clarify documentation of SparseMatrixBase's methods returning a sub-matrix.
2016-09-16 11:23:19 +02:00
Gael Guennebaud
50e203c717
bug #828 : clarify documentation of SparseMatrixBase's unary methods.
2016-09-16 10:40:50 +02:00
Gael Guennebaud
fa9049a544
Let be consistent and consider any denormal number as zero.
2016-09-15 11:24:03 +02:00
Gael Guennebaud
b33144e4df
merge
2016-09-15 11:22:16 +02:00
Benoit Steiner
c0d56a543e
Added several missing EIGEN_DEVICE_FUNC qualifiers
2016-09-14 14:06:21 -07:00
Benoit Steiner
488ad7dd1b
Added missing EIGEN_DEVICE_FUNC qualifiers
2016-09-14 13:35:00 -07:00
Benoit Steiner
779faaaeba
Fixed compilation warnings generated by nvcc 6.5 (and below) when compiling the EIGEN_THROW macro
2016-09-14 09:56:11 -07:00
Gael Guennebaud
1c8347e554
Fix product for custom complex type. (conjugation was ignored)
2016-09-14 18:28:49 +02:00
Benoit Steiner
ff47717f25
Suppress warning 2527 and 2529, which correspond to the "calling a __host__ function from a __host__ __device__ function is not allowed" message in nvcc 6.5.
2016-09-13 12:49:40 -07:00
Benoit Steiner
309190cf02
Suppress message 1222 when compiling with nvcc: this ensures that we don't warnings about unknown warning messages when compiling with older versions of nvcc
2016-09-13 12:42:13 -07:00
Gael Guennebaud
c10620b2b0
Fix typo in doc.
2016-09-13 09:25:07 +02:00
Gael Guennebaud
73c8f2f697
bug #1285 : fix regression introduced in changeset 00c29c2cae
2016-09-13 07:58:39 +02:00
Benoit Steiner
e4d4d15588
Register the cxx11_tensor_device only for recent cuda architectures (i.e. >= 3.0) since the test instantiate contractions that require a modern gpu.
2016-09-12 19:01:52 -07:00
Benoit Steiner
4dfd888c92
CUDA contractions require arch >= 3.0: don't compile the cuda contraction tests on older architectures.
2016-09-12 18:49:01 -07:00
Benoit Steiner
028e299577
Fixed a bug impacting some outer reductions on GPU
2016-09-12 18:36:52 -07:00
Benoit Steiner
5f50f12d2c
Added the ability to compute the absolute value of a complex number on GPU, as well as a test to catch the problem.
2016-09-12 13:46:13 -07:00
Benoit Steiner
8321dcce76
Merged latest updates from trunk
2016-09-12 10:33:05 -07:00
Benoit Steiner
eb6ba00cc8
Properly size the list of waiters
2016-09-12 10:31:55 -07:00
Benoit Steiner
a618094b62
Added a resize method to MaxSizeVector
2016-09-12 10:30:53 -07:00
Gael Guennebaud
228ae29591
Fix compilation on 32 bits systems.
2016-09-09 22:34:38 +02:00
Gael Guennebaud
471eac5399
bug #1195 : move NumTraits::Div<>::Cost to internal::scalar_div_cost (with some specializations in arch/SSE and arch/AVX)
2016-09-08 08:36:27 +02:00
Gael Guennebaud
d780983f59
Doc: explain minimal requirements on nullary functors
2016-09-06 23:14:52 +02:00
Gael Guennebaud
85fb517eaf
Generalize ScalarBinaryOpTraits to any complex-real combination as defined by NumTraits (instead of supporting std::complex only).
2016-09-06 17:23:15 +02:00
Gael Guennebaud
447f269561
Disable previous workaround.
2016-09-06 15:49:02 +02:00
Gael Guennebaud
b046a3f87d
Workaround MSVC instantiation faillure of has_*ary_operator at the level of triats<Ref>::match so that the has_*ary_operator are really properly instantiated throughout the compilation unit.
2016-09-06 15:47:04 +02:00
Gael Guennebaud
3cb914f332
bug #1266 : remove CUDA guards on MatrixBase::<decomposition> definitions. (those used to break old nvcc versions that we propably don't care anymore)
2016-09-06 09:55:50 +02:00
Gael Guennebaud
e1642f485c
bug #1288 : fix memory leak in arpack wrapper.
2016-09-05 18:01:30 +02:00
Gael Guennebaud
19a95b3309
Fix shadowing wrt Eigen::Index
2016-09-05 17:19:47 +02:00
Gael Guennebaud
dabc81751f
Fix compilation when cuda_fp16.h does not exist.
2016-09-05 17:14:20 +02:00
Gael Guennebaud
e13071dd13
Workaround a weird msvc 2012 compilation error.
2016-09-05 15:50:41 +02:00
Gael Guennebaud
d123717e21
Fix for msvc 2012 and older
2016-09-05 15:26:56 +02:00
Benoit Steiner
87a8a1975e
Fixed a regression test
2016-09-02 19:29:33 -07:00
Benoit Steiner
13df3441ae
Use MaxSizeVector instead of std::vector: xcode sometimes assumes that std::vector allocates aligned memory and therefore issues aligned instruction to initialize it. This can result in random crashes when compiling with AVX instructions enabled.
2016-09-02 19:25:47 -07:00
Benoit Steiner
373c340b71
Fixed a typo
2016-09-02 15:41:17 -07:00
Benoit Steiner
cadd124d73
Pulled latest update from trunk
2016-09-02 15:30:02 -07:00
Benoit Steiner
05b0518077
Made the index type an explicit template parameter to help some compilers compile the code.
2016-09-02 15:29:34 -07:00
Benoit Steiner
adf864fec0
Merged in rmlarsen/eigen (pull request PR-222)
...
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 14:11:20 -07:00
Benoit Steiner
5a6be66cef
Turned the Index type used by the nullary wrapper into a template parameter.
2016-09-02 14:10:29 -07:00
Rasmus Munk Larsen
13e93ca8b7
Fix CUDA build broken by changes to min and max reduction.
2016-09-02 13:41:36 -07:00
Benoit Steiner
6c05c3dd49
Fix the cxx11_tensor_cuda.cu test on 32bit platforms.
2016-09-02 11:12:16 -07:00
Gael Guennebaud
49c0390ce0
merge
2016-09-02 15:24:14 +02:00
Gael Guennebaud
d6c8366d84
Fix compilation with MSVC 2012
2016-09-02 15:23:32 +02:00
Benoit Steiner
039e225f7f
Added a test for nullary expressions on CUDA
...
Also check that we can mix 64 and 32 bit indices in the same compilation unit
2016-09-01 13:28:12 -07:00
Benoit Steiner
c53f783705
Updated the contraction code to support constant inputs.
2016-09-01 11:41:27 -07:00
Gael Guennebaud
ef54723dbe
One more msvc fix iteration, the previous one was over-simplified for visual
2016-09-01 15:04:53 +02:00
Gael Guennebaud
46475eff9a
Adjust Tensor module wrt recent change in nullary functor
2016-09-01 13:40:45 +02:00
Gael Guennebaud
72a4d49315
Fix compilation with CUDA 8
2016-09-01 13:39:33 +02:00
Gael Guennebaud
f9f32e9e2d
Fix compilation with nvcc
2016-09-01 13:06:14 +02:00
Gael Guennebaud
3d946e42b3
Fix compilation with visual studio
2016-09-01 12:59:32 +02:00
Benoit Steiner
221f619bea
Merged in rmlarsen/eigen (pull request PR-221)
...
Fix bugs to make min- and max reducers work with correctly with IEEE infinities.
2016-08-31 15:10:10 -07:00
Rasmus Munk Larsen
a1e092d1e8
Fix bugs to make min- and max reducers with correctly with IEEE infinities.
2016-08-31 15:04:16 -07:00
Gael Guennebaud
836fa25a82
Make sure sizeof is truelly needed, thus improving SFINAE portability.
2016-08-31 23:40:18 +02:00
Gael Guennebaud
84cf6e42ca
minor tweaks in has_* helpers
2016-08-31 23:04:14 +02:00
Gael Guennebaud
7ae819123c
Simplify CwiseNullaryOp example.
2016-08-31 15:46:04 +02:00
Gael Guennebaud
218c37beb4
bug #1286 : automatically detect the available prototypes of functors passed to CwiseNullaryExpr such that functors have only to implement the operators that matters among:
...
operator()()
operator()(i)
operator()(i,j)
Linear access is also automatically detected based on the availability of operator()(i,j).
2016-08-31 15:45:25 +02:00
Gael Guennebaud
efe2c225c9
bug #1283 : add regression unit test
2016-08-31 13:04:29 +02:00
Gael Guennebaud
3456247437
bug #1283 : quick fix for products involving uncommon general block access to vectors.
2016-08-31 08:17:15 +02:00
Gael Guennebaud
8c48d42530
Fix 4x4 inverse with non-linear destination
2016-08-30 23:16:38 +02:00
Gael Guennebaud
e7fbbc2748
Doc: add links and discourage user to write their own expression (better use CwiseNullaryOp)
2016-08-30 15:57:46 +02:00
Gael Guennebaud
1e2ab8b0b3
Doc: add an exemple showing how custom expression can be advantageously implemented via CwiseNullaryOp.
2016-08-30 15:40:41 +02:00
Gael Guennebaud
9c9e23858e
Doc: split customizing-eigen page into sub-pages and re-structure a bit the different topics
2016-08-30 11:10:08 +02:00
Gael Guennebaud
cffe8bbff7
Doc: add link to example
2016-08-30 10:45:27 +02:00
Gael Guennebaud
c57317035a
Fix unit test for 1x1 matrices
2016-08-30 10:20:23 +02:00
Gael Guennebaud
1f84f0d33a
merge EulerAngles module
2016-08-30 10:01:53 +02:00
Gael Guennebaud
68e803a26e
Fix warning
2016-08-30 09:21:57 +02:00
Gael Guennebaud
e074f720c7
Include missing forward declaration of SparseMatrix
2016-08-29 18:56:46 +02:00
Gael Guennebaud
2915e1fc5d
Revert part of changeset 5b3a6f51d3
...
to keep accuracy of smallest eigenvalues.
2016-08-29 14:14:18 +02:00
Gael Guennebaud
7e029d1d6e
bug #1271 : add SparseMatrix::coeffs() methods returning a 1D view of the non zero coefficients.
2016-08-29 12:06:37 +02:00
Gael Guennebaud
a93e354d92
Add some pre-allocation unit tests (not working yet)
2016-08-29 11:08:44 +02:00
Gael Guennebaud
6cd7b9ea6b
Fix compilation with cuda 8
2016-08-29 11:06:08 +02:00
Gael Guennebaud
8f4b4ad5fb
use ::hlog if available.
2016-08-29 11:05:32 +02:00
Gael Guennebaud
35a8e94577
bug #1167 : simplify installation of header files using cmake's install(DIRECTORY ...) command.
2016-08-29 10:59:37 +02:00
Gael Guennebaud
0decc31aa8
Add generic implementation of conj_helper for custom complex types.
2016-08-29 09:42:29 +02:00
Gael Guennebaud
fd9caa1bc2
bug #1282 : fix implicit double to float conversion warning
2016-08-28 22:45:56 +02:00
Gael Guennebaud
68d1897e8a
Make sure that our log1p implementation is called as a last resort only.
2016-08-26 15:30:55 +02:00
Gael Guennebaud
fe60856fed
Add overload of numext::log1p for float/double in CUDA
2016-08-26 15:28:59 +02:00
Gael Guennebaud
0f56b5a6de
enable vectorization path when testing half on cuda, and add test for log1p
2016-08-26 14:55:51 +02:00
Gael Guennebaud
965e595f02
Add missing log1p method
2016-08-26 14:55:00 +02:00
Gael Guennebaud
1329c55875
Fix compilation with boost::multiprec.
2016-08-25 14:54:39 +02:00
Gael Guennebaud
441b7eaab2
Add support for non trivial scalar factor in sparse selfadjoint * dense products, and enable +=/-= assignement for such products.
...
This changeset also improves the performance by working on column of the result at once.
2016-08-24 13:06:34 +02:00
Gael Guennebaud
8132a12625
bug #1268 : detect faillure in LDLT and report them through info()
2016-08-23 23:15:55 +02:00
Gael Guennebaud
bde9b456dc
Typo
2016-08-23 21:36:36 +02:00
Gael Guennebaud
326320ec7b
Fix compilation in non C++11 mode.
2016-08-23 19:28:57 +02:00
Gael Guennebaud
ea2e968257
Address several implicit scalar conversions.
2016-08-23 18:44:33 +02:00
Gael Guennebaud
0a6a50d1b0
Cleanup eiegnvector extraction: leverage matrix products and compile-time sizes, remove numerous useless temporaries.
2016-08-23 18:14:37 +02:00
Gael Guennebaud
00b2666853
bug #645 : patch from Tobias Wood implementing the extraction of eigenvectors in GeneralizedEigenSolver
2016-08-23 17:37:38 +02:00
Gael Guennebaud
504a4404f1
Optimize expression matching "d?=a-b*c" as "d?=a; d?=b*c;"
2016-08-23 16:52:22 +02:00
Gael Guennebaud
e47a8928ec
Fix compilation in check_for_aliasing due to ambiguous specializations
2016-08-23 16:19:10 +02:00
Gael Guennebaud
6739f6bb1b
Merged in traversaro/eigen-1/traversaro/modify-findeigen3cmake-to-find-eigen3con-1469782761059 (pull request PR-213)
...
Modify FindEigen3.cmake to find Eigen3Config.cmake
2016-08-23 15:53:57 +02:00
Gael Guennebaud
ef3de20481
Cleanup cost of tanh
2016-08-23 14:39:55 +02:00
Gael Guennebaud
b3151bca40
Implement pmadd for float and double to make it consistent with the vectorized path when FMA is available.
2016-08-23 14:24:08 +02:00
Gael Guennebaud
a4c266f827
Factorize the 4 copies of tanh implementations, make numext::tanh consistent with array::tanh, enable fast tanh in fast-math mode only.
2016-08-23 14:23:08 +02:00
Gael Guennebaud
82147cefff
Fix possible overflow and biais in integer random generator
2016-08-23 13:25:31 +02:00
Silvio Traversaro
068ccab9fe
FindEigen3.cmake : search for package only if EIGEN3_INCLUDE_DIR is not already defined
2016-08-22 22:13:10 +00:00
Gael Guennebaud
581b6472d1
bug #1265 : remove outdated notes
2016-08-22 23:25:39 +02:00
Igor Babuschkin
59bacfe520
Fix compilation on CUDA 8 by removing call to h2log1p
2016-08-15 23:38:05 +01:00
Benoit Steiner
34ae80179a
Use array_prod instead of calling TotalSize since TotalSize is only available on DSize.
2016-08-15 10:29:14 -07:00
Benoit Steiner
2556565b4b
Merged in ibab/eigen/extend-log1p (pull request PR-218)
...
Fix compilation on CUDA 8 due to missing h2log1p function
2016-08-15 08:31:03 -07:00
Benoit Steiner
30dd6f5e34
Close branch extend-log1p
2016-08-15 08:31:03 -07:00
Benoit Steiner
fe73648c98
Fixed a bug in the documentation.
2016-08-12 10:00:43 -07:00
Christoph Hertzberg
9636a8ed43
bug #1273 : Add parentheses when redefining eigen_assert
2016-08-12 15:34:21 +02:00
Christoph Hertzberg
c83b754ee0
bug #1272 : Disable assertion when total number of columns is zero.
...
Also moved assertion to finished() method and adapted unit-test
2016-08-12 15:15:34 +02:00
Benoit Steiner
e3a8dfb02f
std::erfcf doesn't exist: use numext::erfc instead
2016-08-11 15:24:06 -07:00
Benoit Steiner
64e68cbe87
Don't attempt to optimize partial reductions when the optimized implementation doesn't buy anything.
2016-08-08 19:29:59 -07:00
Benoit Steiner
5157ce8cbf
Merged in ibab/eigen/extend-log1p (pull request PR-217)
...
Add log1p support for CUDA and half floats
2016-08-08 14:50:00 -07:00
Igor Babuschkin
aee693ac52
Add log1p support for CUDA and half floats
2016-08-08 20:24:59 +01:00
Benoit Steiner
72096f3bd4
Merged in suiyuan2009/eigen/fix_tanh_inconsistent_for_tensorflow (pull request PR-215)
...
Fix_tanh_inconsistent_for_tensorflow
2016-08-08 09:06:45 -07:00
Christoph Hertzberg
3e4a33d4ba
bug #1272 : Let CommaInitializer work for more border cases (enhances fix of bug #1242 ).
...
The unit test tests all combinations of 2x2 block-sizes from 0 to 3.
2016-08-08 17:26:48 +02:00
Ziming Dong
1031223c09
fix tanh inconsistent
2016-08-06 19:48:50 +08:00
Ziming Dong
5cf1e4c79b
create fix_tanh_inconsistent branch
2016-08-06 15:54:33 +08:00
Christoph Hertzberg
fe4b927e9c
Add aliases Eigen_*_DIR to Eigen3_*_DIR
...
This is to make configuring work again after project was renamed from Eigen to Eigen3
2016-08-05 15:21:14 +02:00
Benoit Steiner
fe778427f2
Fixed the constructors of the new half_base class.
2016-08-04 18:32:26 -07:00
Benoit Steiner
5eea1c7f97
Fixed cut and paste bug in debud message
2016-08-04 17:34:13 -07:00
Benoit Steiner
9506343349
Fixed the isnan, isfinite and isinf operations on GPU
2016-08-04 17:25:53 -07:00
Benoit Steiner
b50d8f8c4a
Extended a regression test to validate that we basic fp16 support works with cuda 7.0
2016-08-03 16:50:13 -07:00
Benoit Steiner
fad9828769
Deleted redundant regression test.
2016-08-03 16:08:37 -07:00
Benoit Steiner
373bb12dc6
Check that it's possible to forward declare the hlaf type.
2016-08-03 16:07:31 -07:00
Gael Guennebaud
17b9a55d98
Move Eigen::half_impl::half to Eigen::half while preserving the free functions to the Eigen::half_impl namespace together with ADL
2016-08-04 00:00:43 +02:00
Benoit Steiner
ca2cee2739
Merged in ibab/eigen (pull request PR-206)
...
Expose real and imag methods on Tensors
2016-08-03 11:53:04 -07:00
Benoit Steiner
d92df04ce8
Cleaned up the new float16 test a bit
2016-08-03 11:50:07 -07:00
Benoit Steiner
81099ef482
Added a test for fp16
2016-08-03 11:41:17 -07:00
Benoit Steiner
a20b58845f
CUDA_ARCH isn't always defined, so avoid relying on it too much when figuring out which implementation to use for reductions. Instead rely on the device to tell us on which hardware version we're running.
2016-08-03 10:00:43 -07:00
Gael Guennebaud
819d0cea1b
List PARDISO solver.
2016-08-02 23:32:41 +02:00
Christoph Hertzberg
f4404777ff
Change project name to Eigen3, to be compatible with FindEigen3.cmake and Eigen3Config.cmake.
...
This is related to pull-requests 214.
2016-08-02 17:08:57 +00:00
Benoit Steiner
fd220dd8b0
Use numext::conj instead of std::conj
2016-08-01 18:16:16 -07:00
Benoit Steiner
e256acec7c
Avoid unecessary object copies
2016-08-01 17:03:39 -07:00
Gael Guennebaud
7995cec90c
Fix vectorization logic for coeff-based product for some corner cases.
2016-07-31 15:20:22 +02:00
Benoit Steiner
02fe89f5ef
half implementation has been moved to half_impl namespace
2016-07-29 15:09:34 -07:00
Benoit Steiner
2693fd54bf
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 13:45:56 -07:00
Christoph Hertzberg
c5b893f434
bug #1266 : half implementation has been moved to half_impl namespace
2016-07-29 18:36:08 +02:00
Silvio Traversaro
5e51a361fe
Modify FindEigen3.cmake to find Eigen3Config.cmake
2016-07-29 08:59:38 +00:00
klimpel
ca5effa16c
MSVC-2010 is making problems with SFINAE again. But restricting to the variant for very old compilers (enum, template<typename C> for both function definitions) fixes the problem.
2016-07-28 15:58:17 +01:00
Gael Guennebaud
4057f9b1fc
Enable slice-vectorization+inner-unrolling when unaligned vectorization is allowed. For instance, this permits to vectorize 5x5 matrices (including product)
2016-07-28 13:47:33 +02:00
Gael Guennebaud
5fbe7aa604
Update and fix Cholesky mini benchmark
2016-07-28 11:26:30 +02:00
Gael Guennebaud
a72752caac
Vectorize more small product expressions by letting the general assignement logic decides on the sizes that are OK for vectorization.
2016-07-28 11:21:07 +02:00
Gael Guennebaud
cc2f6d68b1
bug #1264 : fix compilation
2016-07-27 23:30:47 +02:00
Gael Guennebaud
188590db82
Add instructions for LAPACKE+Accelerate
2016-07-27 15:07:35 +02:00
Gael Guennebaud
8972323c08
Big 1261: add missing max(ADS,ADS) overload (same for min)
2016-07-27 14:52:48 +02:00
Gael Guennebaud
5d94dc85e5
bug #1260 : add regression test
2016-07-27 14:38:30 +02:00
Gael Guennebaud
0d7039319c
bug #1260 : remove doubtful specializations of ScalarBinaryOpTraits
2016-07-27 14:35:52 +02:00
Christoph Hertzberg
d3d7c6245d
Add brackets to block matrix and fixed some typos
2016-07-27 09:55:39 +02:00
Gael Guennebaud
0eece608b4
Added tag 3.3-beta2 for changeset f6b3cf8de9
2016-07-26 23:52:14 +02:00
Gael Guennebaud
f6b3cf8de9
Bump to 3.3-beta2
2016-07-26 23:51:59 +02:00
Gael Guennebaud
9d16b6e1cf
Formatting
2016-07-26 23:51:43 +02:00
Gael Guennebaud
fd2f989b1d
Fix testing of nearly zero input matrices.
2016-07-26 14:46:02 +02:00
Gael Guennebaud
c9e3e438eb
Add more very small numbers in the list of nearly "zero" values when testing SVD and EVD algorithms
2016-07-26 14:45:44 +02:00
Gael Guennebaud
95113cb15c
Improve robustness of 2x2 eigenvalue with shifting and scaling
2016-07-26 14:43:54 +02:00
Gael Guennebaud
7f7e84aa36
Fix compilation with MKL support
2016-07-26 13:31:29 +02:00
Gael Guennebaud
429028b652
Typo.
2016-07-26 12:12:53 +02:00
Gael Guennebaud
6b89fa802c
Typos.
2016-07-26 12:08:04 +02:00
Gael Guennebaud
c581c8fa79
Fix with expession template scalar types.
2016-07-26 11:33:28 +02:00
Gael Guennebaud
8021aed89e
Split BLAS/LAPACK versus MKL documentation
2016-07-26 11:11:59 +02:00
Gael Guennebaud
757971e7ea
bug #1258 : fix compilation of Map<SparseMatrix>::coeffRef
2016-07-26 09:40:19 +02:00
Gael Guennebaud
c9425492c8
Update doc.
2016-07-25 18:41:26 +02:00
Gael Guennebaud
0592b4cfbf
merge
2016-07-25 18:20:22 +02:00
Gael Guennebaud
9c663e4ee8
Clean references to MKL in LAPACKe support.
2016-07-25 18:20:08 +02:00
Gael Guennebaud
0c06077efa
Rename MKL files
2016-07-25 18:00:47 +02:00
Gael Guennebaud
4d54e3dd33
bug #173 : remove dependency to MKL for LAPACKe backend.
2016-07-25 17:55:07 +02:00
Benoit Steiner
3d3d34e442
Deleted dead code.
2016-07-25 08:53:37 -07:00
Gael Guennebaud
34b483e25d
bug #1249 : enable use of __builtin_prefetch for GCC, clang, and ICC only.
2016-07-25 15:17:45 +02:00
Gael Guennebaud
6d5daf32f5
bug #1255 : comment out broken and unsused line.
2016-07-25 14:48:30 +02:00
Gael Guennebaud
f9598d73b5
bug #1250 : fix pow() for AutoDiffScalar with custom nested scalar type.
2016-07-25 14:42:19 +02:00
Gael Guennebaud
fd1117f2be
Implement digits10 for mpreal
2016-07-25 14:38:55 +02:00
Gael Guennebaud
9908020d36
Add minimal support for Array<string>, and fix Tensor<string>
2016-07-25 14:25:56 +02:00
Gael Guennebaud
4184a3e544
Extend boost.multiprec unit test with ET on, complexes, and general/generalized eigenvalue solvers.
2016-07-25 12:36:22 +02:00
Gael Guennebaud
1b2049fbda
Enforce scalar types in calls to max/min (helps with expression template scalar types)
2016-07-25 12:35:10 +02:00
Gael Guennebaud
b118bc76eb
Add digits10 overload for complex.
2016-07-25 12:33:21 +02:00
Gael Guennebaud
c96af5381f
Remove custom complex division function cdiv.
2016-07-25 12:31:58 +02:00
Gael Guennebaud
e1c7c5968a
Update doc.
2016-07-25 11:18:04 +02:00
Gael Guennebaud
8fffc81606
Add NumTraits::digits10() function based on numeric_limits::digits10 and make use of it for printing matrices.
2016-07-25 11:13:01 +02:00
Gael Guennebaud
5f03584752
merge
2016-07-23 17:52:44 +02:00
Gael Guennebaud
1b0353c659
Fix misuse of dummy_precesion in eigenvalues solvers
2016-07-23 17:52:31 +02:00
Benoit Steiner
c6b0de2c21
Improved partial reductions in more cases
2016-07-22 17:18:20 -07:00
Gael Guennebaud
72744d93ef
Allows the compiler to inline outer products (the change from default to dont-inline in changeset 737bed19c1
...
was not motivated)
2016-07-22 17:02:28 +02:00
Gael Guennebaud
32d95e86c9
merge
2016-07-22 16:43:12 +02:00
Gael Guennebaud
60d5980a41
add a note
2016-07-22 15:46:23 +02:00
Gael Guennebaud
d7a0e52478
Fix testing of log nearby 1
2016-07-22 15:44:26 +02:00
Gael Guennebaud
7acf23c14c
Truely split unit test.
2016-07-22 15:41:23 +02:00
Gael Guennebaud
24af67a6cc
Fix boostmultiprec for C++03
2016-07-22 15:30:54 +02:00
Gael Guennebaud
395c835f4b
Fix CUDA compilation
2016-07-22 15:30:24 +02:00
Gael Guennebaud
d075d122ea
Move half unit test from unsupported to main tests
2016-07-22 14:34:19 +02:00
Gael Guennebaud
47afc9a365
More cleaning in half:
...
- put its definition and functions in its own half_impl namespace such that the free function does not polute the Eigen namespace while still making them visible for half through ADL.
- expose Eigen::half throguh a using statement
- move operator<< from std to half_float namespace
2016-07-22 14:33:28 +02:00
Gael Guennebaud
0f350a8b7e
Fix CUDA compilation
2016-07-21 18:47:07 +02:00
Gael Guennebaud
bf91a44f4a
Use ADL and log10 for printing matrices.
2016-07-21 15:48:24 +02:00
Gael Guennebaud
82798162c0
Extend unit testing of half with ADL and arrays.
2016-07-21 15:47:21 +02:00
Gael Guennebaud
87fbda812f
Add missing log10 and random generator for half.
2016-07-21 15:46:45 +02:00
Gael Guennebaud
01d12d3e82
Some cleanup in Halh: standard functions should be defined in the namespace of the class half to make ADL work, and thus the global is* functions can be removed.
2016-07-21 15:10:48 +02:00
Gael Guennebaud
007edee1ac
Add a doc page summarizing the true speed of Eigen's decompositions.
2016-07-21 12:32:02 +02:00
Gael Guennebaud
9b76be9d21
Update benchmark for dense solver to stress least-squares pb, and to output a HTML table
2016-07-21 12:30:53 +02:00
Gael Guennebaud
72950effdf
enable testing of Boost.Multiprecision with expression templates
2016-07-20 18:21:30 +02:00
Yi Lin
7b4abc2b1d
Fixed a code comment error
2016-07-20 22:28:54 +08:00
Gael Guennebaud
b64b9d0172
Add a unit test to stress our solvers with Boost.Multiprecision
2016-07-20 15:20:14 +02:00
Gael Guennebaud
5e4dda8a12
Enable custom scalar types in some unit tests.
2016-07-20 15:19:17 +02:00
Gael Guennebaud
87d480d785
Make use of EIGEN_TEST_MAX_SIZE
2016-07-20 15:14:20 +02:00
Gael Guennebaud
7722913475
Fix ambiguous specialization with custom scalar type
2016-07-20 15:13:44 +02:00
Gael Guennebaud
fd057f86b3
Complete the coeff-wise math function table.
2016-07-20 12:14:10 +02:00
Gael Guennebaud
9e8476ef22
Add missing Eigen::rsqrt global function
2016-07-20 11:59:49 +02:00
Gael Guennebaud
4b4c296d6e
Simplify ScalarBinaryOpTraits by removing the Defined enum, and extend its documentation.
2016-07-20 09:56:39 +02:00
Gael Guennebaud
e3bf874c83
Workaround MSVC 2010 compilation issue.
2016-07-18 15:17:25 +02:00
Gael Guennebaud
0f89c6d6b5
Add a summary of possible values for EIGEN_COMP_MSVC
2016-07-18 15:16:13 +02:00
Gael Guennebaud
18884f17d7
Remove static constant declaration: this enforces compiler to generate costly code for thread safety.
2016-07-18 15:05:17 +02:00
Gael Guennebaud
79574e384e
Make scalar_product_op the default (instead of void)
2016-07-18 12:03:05 +02:00
Gael Guennebaud
6a3c451c1c
Permits call to explicit ctor.
2016-07-18 12:02:20 +02:00
Gael Guennebaud
0c3fe4aca5
merge
2016-07-18 10:44:15 +02:00
Gael Guennebaud
db9b154193
Add missing non-const reverse method in VectorwiseOp.
2016-07-16 15:19:28 +02:00
Gael Guennebaud
461cd819c2
Workaround VS2015 bug
2016-07-13 18:46:01 +02:00
Gael Guennebaud
5ea0864c81
Fix regression in a previous commit: some diagonal entry might not be treated by the 2x2 real preconditioner.
2016-07-13 18:37:54 +02:00
Benoit Steiner
20f7ef2f89
An evalTo expression is only aligned iff both the lhs and the rhs are aligned.
2016-07-12 10:56:42 -07:00
Gael Guennebaud
b4343aa67e
Avoid division by very small entries when extracting singularvalues, and explicitly handle the 1x1 complex case.
2016-07-12 17:22:03 +02:00
Gael Guennebaud
e2aa58b631
Consider denormals as zero in makeJacobi and 2x2 SVD.
...
This also fix serious issues with x387 for which values can be much smaller than the smallest denormal!
2016-07-12 17:21:03 +02:00
Gael Guennebaud
263993a7b6
Fix test for nearly null input
2016-07-12 17:19:26 +02:00
Gael Guennebaud
9ab35d8ba4
Fix compilation of doc
2016-07-12 16:47:39 +02:00
Gael Guennebaud
19614497ae
Add some doxygen's images to support both old and recent doxygen versions
...
(with some vague definitions of old and recent ;) )
2016-07-12 16:45:43 +02:00
Gael Guennebaud
c98bac2966
Manually add -stdd=c++11 to nvcc for old cmake versions
2016-07-12 09:29:18 +02:00
Benoit Steiner
013a904237
Pulled latest updates from trunk
2016-07-11 14:29:05 -07:00
Benoit Steiner
40eb97516c
reverted unintended change.
2016-07-11 14:28:03 -07:00
Benoit Steiner
03b71c273e
Made the packetmath test compile again. A better fix would be to move the special function tests to the unsupported directory where the code now resides.
2016-07-11 13:50:24 -07:00
Benoit Steiner
3a2dd352ae
Improved the contraction mapper to properly support tensor products
2016-07-11 13:43:41 -07:00
Benoit Steiner
0bc020be9d
Improved the detection of packet size in the tensor scan evaluator.
2016-07-11 12:14:56 -07:00
Gael Guennebaud
a96a7ce3f7
Move CUDA's special functions to SpecialFunctions module.
2016-07-11 18:39:11 +02:00
Gael Guennebaud
bec35f4c55
Clarify that SpecialFunctions is unsupported
2016-07-11 18:38:40 +02:00
Gael Guennebaud
fd60966310
merge
2016-07-11 18:11:47 +02:00
Gael Guennebaud
7d636349dc
Fix configuration of CUDA:
...
- preserve user defined CUDA_NVCC_FLAGS
- remove the -ansi flag that conflicts with -std=c++11
- do not add -std=c++11 if already there
2016-07-11 18:09:04 +02:00
klimpel
8b3fc31b55
compile fix (SFINAE variant apparently didn't work for all compilers) for the following compiler/platform:
...
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.
2016-07-11 17:42:22 +02:00
Gael Guennebaud
3e348fdcf9
Workaround MSVC bug
2016-07-11 15:24:52 +02:00
Gael Guennebaud
131ee4bb8e
Split test_slice_in_expr which seems to be huge for visual
2016-07-11 11:46:55 +02:00
Gael Guennebaud
194daa3048
Fix assertion (it did not make sense for static_val types)
2016-07-11 11:39:27 +02:00
Gael Guennebaud
18c35747ce
Emulate _BitScanReverse64 for 32 bits builds
2016-07-11 11:38:04 +02:00
Konstantinos Margaritis
ef05463fcf
Merged kmargar/eigen/tip into default, Altivec/VSX port should be working ok now.
2016-07-10 16:11:46 +03:00
Konstantinos Margaritis
9f7caa7e7d
minor fixes for big endian altivec/vsx
2016-07-10 07:05:10 -03:00
Christoph Hertzberg
3c795c6923
bug #1119 : Adjust call to ?gssvx for SuperLU 5
...
Also improved corresponding cmake module to detect versions 5.x
Based on patch by Christoph Grüninger.
2016-07-10 02:29:57 +02:00
Gael Guennebaud
57113e00f9
Relax strict equality
2016-07-09 23:37:11 +02:00
Gael Guennebaud
599f8ba617
Change runtime to compile-time conditional.
2016-07-08 11:39:43 +02:00
Gael Guennebaud
544935101a
Fix warnings
2016-07-08 11:38:52 +02:00
Gael Guennebaud
59bf2774a3
Fix warnings
2016-07-08 11:38:11 +02:00
Gael Guennebaud
2f7e2614e7
bug #1232 : refactor special functions as a new SpecialFunctions module, currently in unsupported/.
2016-07-08 11:13:55 +02:00
Gael Guennebaud
8b7431d8fd
fix compilation with c++11
2016-07-07 15:18:23 +02:00
Gael Guennebaud
69378eed0b
Split huge unit test
2016-07-07 15:18:04 +02:00
Gael Guennebaud
c684e37d32
Prevent division by zero.
2016-07-07 11:03:01 +02:00
Gael Guennebaud
179ebb88f9
Fix warning
2016-07-07 09:16:40 +02:00
Gael Guennebaud
5d2dada197
Fix warnings
2016-07-07 09:05:15 +02:00
Gael Guennebaud
f5e780fb05
split huge unit test
2016-07-07 08:59:59 +02:00
Gael Guennebaud
66917299a9
Add debug output
2016-07-06 22:27:15 +02:00
Gael Guennebaud
5ca2457fa5
Fix unit test.
2016-07-06 22:25:24 +02:00
Gael Guennebaud
9b68ed4537
Relax is_equal to is_approx because scaling might modify last bit.
2016-07-06 15:02:49 +02:00
Gael Guennebaud
c3b23d7dbf
Fix support of Intel's VML
2016-07-06 14:07:32 +02:00
Gael Guennebaud
8ec4d6480d
Fix compilation with recent updates of icc 2016
2016-07-06 14:07:14 +02:00
Gael Guennebaud
5b3a6f51d3
Improve numerical robustness of RealSchur: add scaling and compare sub-diag entries to largest diagonal entry instead of the 2 neighbors.
2016-07-06 13:45:30 +02:00
Gael Guennebaud
d2b5a19e0f
Fix warning.
2016-07-06 11:05:30 +02:00
Gael Guennebaud
367ef66af3
Re-enable some specializations for Assignment<.,Product<>>
2016-07-05 22:58:14 +02:00
Gael Guennebaud
155d8d8603
Fix compilation with msvc
2016-07-05 14:43:42 +02:00
Gael Guennebaud
43696ede8f
Revert unwanted changes.
2016-07-04 22:40:36 +02:00
Gael Guennebaud
b39fd8217f
Fix nesting of SolveWithGuess, and add unit test.
2016-07-04 17:47:47 +02:00
Gael Guennebaud
ec02af1047
Fix template resolution.
2016-07-04 17:37:33 +02:00
Gael Guennebaud
fbcfc2f862
Add unit test for solveWithGuess, and fix template resolution.
2016-07-04 17:19:38 +02:00
Gael Guennebaud
7f7839c12f
Add documentation and exemples for inplace decomposition.
2016-07-04 17:18:26 +02:00
Gael Guennebaud
32a41ee659
bug #707 : add inplace decomposition through Ref<> for Cholesky, LU and QR decompositions.
2016-07-04 15:13:35 +02:00
Gael Guennebaud
75e80792cc
Update relevent list of changesets.
2016-07-04 14:32:34 +02:00
Gael Guennebaud
dacc544b84
asm escape was not strong enough to prevent too aggressive compiler optimization let's fallback to no-inline.
2016-07-04 14:32:15 +02:00
Gael Guennebaud
b74e45906c
Few fixes in perf-monitoring.
2016-07-04 14:30:50 +02:00
Gael Guennebaud
ce9fc0ce14
fix clang compilation
2016-07-04 12:59:02 +02:00
Gael Guennebaud
440020474c
Workaround compilation issue with msvc
2016-07-04 12:49:19 +02:00
Gael Guennebaud
e61cee7a50
Fix compilation of some unit tests with msvc
2016-07-04 11:49:03 +02:00
Gael Guennebaud
91b3039013
Change the semantic of the last template parameter of Assignment from "Scalar" to "SFINAE" only.
...
The previous "Scalar" semantic was obsolete since we allow for different scalar types in the source and destination expressions.
On can still specialize on scalar types through SFINAE and/or assignment functor.
2016-07-04 11:02:00 +02:00
Gael Guennebaud
0fa9e4a15c
Fix performance regression in dgemm introduced by changeset 5d51a7f12c
2016-07-02 17:35:08 +02:00
Gael Guennebaud
672076db5d
Fix performance regression introduced in changeset e56aabf205
...
.
Register blocking sizes are better handled by the cache size heuristics.
The current code introduced very small blocks, for instance for 9x9 matrix,
thus killing performance.
2016-07-02 15:40:56 +02:00
Igor Babuschkin
78f37ca03c
Expose real and imag methods on Tensors
2016-07-01 17:34:31 +01:00
Gael Guennebaud
d161b8f03a
Merged in carpent/eigen (pull request PR-204)
...
Use complete nested namespace Eigen::internal, thus making the custom static assertion macros available outside the Eigen's namespace.
2016-07-01 09:56:44 +02:00
Benoit Steiner
cb2d8b8fa6
Made it possible to compile reductions for an old cuda architecture and run them on a recent gpu.
2016-06-29 15:42:01 -07:00
Benoit Steiner
b2a47641ce
Made the code compile when using CUDA architecture < 300
2016-06-29 15:32:47 -07:00
Benoit Steiner
b047ca765f
Merged in ibab/eigen/fix-tensor-scan-gpu (pull request PR-205)
...
Add missing CUDA kernel to tensor scan op
2016-06-29 14:52:19 -07:00
Igor Babuschkin
85699850d9
Add missing CUDA kernel to tensor scan op
...
The TensorScanOp implementation was missing a CUDA kernel launch.
This adds a simple placeholder implementation.
2016-06-29 11:54:35 +01:00
Justin Carpentier
6126886a67
Use complete nested namespace Eigen::internal
2016-06-28 20:09:25 +02:00
Benoit Jacob
328c5d876a
Undo changes in AltiVec --- I don't have any way to test there.
2016-06-28 11:15:25 -04:00
Benoit Jacob
38fb606052
Avoid global variables with static constructors in NEON/Complex.h
2016-06-28 11:12:49 -04:00
Benoit Steiner
1a9f92e781
Added a test to validate the tensor scan evaluation on GPU. The test is currently disabled since the code segfaults.
2016-06-27 16:02:52 -07:00
Benoit Steiner
75c333f94c
Don't store the scan axis in the evaluator of the tensor scan operation since it's only used in the constructor.
...
Also avoid taking references to values that may becomes stale after a copy construction.
2016-06-27 10:32:38 -07:00
xantares
c52c8d76da
Disable pkgconfig only for native windows builds
...
ie enable it for MinGW
2016-06-27 16:43:08 +00:00
Gael Guennebaud
d937a420a2
Fix compilation with MSVC by using our portable numext::log1p implementation.
2016-08-22 15:44:21 +02:00
Gael Guennebaud
2d5731e40a
bug #1270 : bypass custom asm for pmadd and recent clang version
2016-08-22 15:38:03 +02:00
Gael Guennebaud
49b005181a
Define EIGEN_COMP_CLANG to clang version as major*100+minor (e.g., 307 corresponds to clang 3.7)
2016-08-22 15:37:05 +02:00
Gael Guennebaud
130f891bb0
bug #1278 : ease parsing
2016-08-22 15:00:29 +02:00
Benoit Steiner
7944d4431f
Made the cost model cwiseMax and cwiseMin methods consts to help the PowerPC cuda compiler compile this code.
2016-08-18 13:46:36 -07:00
Benoit Steiner
647a51b426
Force the inlining of a simple accessor.
2016-08-18 12:31:02 -07:00
Benoit Steiner
a452dedb4f
Merged in ibab/eigen/double-tensor-reduction (pull request PR-216)
...
Enable efficient Tensor reduction for doubles on the GPU (continued)
2016-08-18 12:29:54 -07:00
Igor Babuschkin
18c67df31c
Fix remaining CUDA >= 300 checks
2016-08-18 17:18:30 +01:00
Igor Babuschkin
1569a7d7ab
Add the necessary CUDA >= 300 checks back
2016-08-18 17:15:12 +01:00
Benoit Steiner
2b17f34574
Properly detect the type of the result of a contraction.
2016-08-16 16:00:30 -07:00
Igor Babuschkin
841e075154
Remove CUDA >= 300 checks and enable outer reductin for doubles
2016-08-06 18:07:50 +01:00
Igor Babuschkin
0425118e2a
Merge upstream changes
2016-08-05 14:34:57 +01:00
Igor Babuschkin
9537e8b118
Make use of atomicExch for atomicExchCustom
2016-08-05 14:29:58 +01:00
Igor Babuschkin
eeb0d880ee
Enable efficient Tensor reduction for doubles
2016-07-01 19:08:26 +01:00
Gael Guennebaud
d476cadbb8
bug #1247 : fix regression in compilation of pow(integer,integer), and add respective unit tests.
2016-06-25 10:12:06 +02:00
Gael Guennebaud
cfff370549
Fix hyperbolic functions for autodiff.
2016-06-24 23:21:35 +02:00
Gael Guennebaud
c50c73cae2
Fix missing specialization.
2016-06-24 23:10:39 +02:00
Gael Guennebaud
3852351793
merge pull request 198
2016-06-24 11:48:17 +02:00
Gael Guennebaud
6dd9077070
Fix some unused typedef warnings.
2016-06-24 11:34:21 +02:00
Gael Guennebaud
ce90647fa5
Fix NumTraits<AutoDiff>
2016-06-24 11:34:02 +02:00
Gael Guennebaud
fa39f81b48
Fix instantiation of ScalarBinaryOpTraits for AutoDiff.
2016-06-24 11:33:30 +02:00
Gael Guennebaud
cd577a275c
Relax promote_scalar_arg logic to enable promotion to Expr::Scalar if conversion to Expr::Literal fails.
...
This is useful to cancel expression template at the scalar level, e.g. with AutoDiff<AutoDiff<>>.
This patch also defers calls to NumTraits in cases for which types are not directly compatible.
2016-06-24 11:28:54 +02:00
Gael Guennebaud
deb45ad4bc
bug #1245 : fix compilation with msvc
2016-06-24 09:52:25 +02:00
Rasmus Munk Larsen
a9c1e4d7b7
Return -1 from CurrentThreadId when called by thread outside the pool.
2016-06-23 16:40:07 -07:00
Rasmus Munk Larsen
d39df320d2
Resolve merge.
2016-06-23 15:08:03 -07:00
Gael Guennebaud
361dbd246d
Add unit test for printing empty tensors
2016-06-23 18:54:30 +02:00
Gael Guennebaud
360a743a10
bug #1241 : does not emmit anything for empty tensors
2016-06-23 18:47:31 +02:00
Gael Guennebaud
55fc04e8b5
Fix operator priority
2016-06-23 15:36:42 +02:00
Gael Guennebaud
bf2d5edecc
Fix warning.
2016-06-23 15:35:17 +02:00
Gael Guennebaud
7c6561485a
merge PR 194
2016-06-23 15:29:57 +02:00
Konstantinos Margaritis
be107e387b
fix compilation with clang 3.9, fix performance with pset1, use vector operators instead of intrinsics in some cases
2016-06-23 10:19:05 -03:00
Gael Guennebaud
76faf4a965
Introduce a NumTraits<T>::Literal type to be used for literals, and
...
improve mixing type support in operations between arrays and scalars:
- 2 * ArrayXcf is now optimized in the sense that the integer 2 is properly promoted to a float instead of a complex<float> (fix a regression)
- 2.1 * ArrayXi is now forbiden (previously, 2.1 was converted to 2)
- This mechanism should be applicable to any custom scalar type, assuming NumTraits<T>::Literal is properly defined (it defaults to T)
2016-06-23 14:27:20 +02:00
Gael Guennebaud
a3f7edf7e7
Biug 1242: fix comma init with empty matrices.
2016-06-23 10:25:04 +02:00
Benoit Steiner
a29a2cb4ff
Silenced a couple of compilation warnings generated by xcode
2016-06-22 16:43:02 -07:00
Benoit Steiner
f8fcd6b32d
Turned the constructor of the PerThread struct into what is effectively a constant expression to make the code compatible with a wider range of compilers
2016-06-22 16:03:11 -07:00
Benoit Steiner
c58df31747
Handle empty tensors in the print functions
2016-06-21 09:22:43 -07:00
Benoit Steiner
de32f8d656
Fixed the printing of rank-0 tensors
2016-06-20 10:46:45 -07:00
Konstantinos Margaritis
8c34b5a0e3
mostly cleanups and modernizing code
2016-06-19 16:13:17 -03:00
Konstantinos Margaritis
b410d46482
mostly cleanups and modernizing code
2016-06-19 16:12:52 -03:00
Konstantinos Margaritis
b80379bda0
fixed pexp<Packet2d>, was failing tests
2016-06-19 16:11:58 -03:00
Tal Hadad
8e198d6835
Complete docs and add ostream operator for EulerAngles.
2016-06-19 20:42:45 +03:00
Benoit Steiner
b055590e91
Made log1p_impl usable inside a GPU kernel
2016-06-16 11:37:40 -07:00
Geoffrey Lalonde
72c95383e0
Add autodiff coverage for standard library hyperbolic functions, and tests.
...
* * *
Corrected tanh derivatived, moved test definitions.
* * *
Added more test cases, removed lingering lines
2016-06-15 23:33:19 -07:00
Gael Guennebaud
67c12531e5
Fix warnings with gcc
2016-06-15 18:11:33 +02:00
Gael Guennebaud
eb91345d64
Move scalar/expr to ArrayBase and fix documentation
2016-06-15 15:22:03 +02:00
Gael Guennebaud
4794834397
Propagate functor to ScalarBinaryOpTraits
2016-06-15 09:58:49 +02:00
Gael Guennebaud
c55035b9c0
Include the cost of stores in unrolling of triangular expressions.
2016-06-15 09:57:33 +02:00
Benoit Steiner
7d495d890a
Merged in ibab/eigen (pull request PR-197)
...
Implement exclusive scan option for Tensor library
2016-06-14 17:54:59 -07:00
Benoit Steiner
aedc5be1d6
Avoid generating pseudo random numbers that are multiple of 5: this helps
...
spread the load over multiple cpus without havind to rely on work stealing.
2016-06-14 17:51:47 -07:00
Gael Guennebaud
4e7c3af874
Cleanup useless helper: internal::product_result_scalar
2016-06-15 00:04:10 +02:00
Gael Guennebaud
101ea26f5e
Include the cost of stores in unrolling (also fix infinite unrolling with expression costing 0 like Constant)
2016-06-15 00:01:16 +02:00
Igor Babuschkin
c4d10e921f
Implement exclusive scan option
2016-06-14 19:44:07 +01:00
Gael Guennebaud
76236cdea4
merge
2016-06-14 15:33:47 +02:00
Gael Guennebaud
1004c4df99
Cleanup unused functors.
2016-06-14 15:27:28 +02:00
Gael Guennebaud
70dad84b73
Generalize expr/expr and scalar/expr wrt scalar types.
2016-06-14 15:26:37 +02:00
Gael Guennebaud
62134082aa
Update AutoDiffScalar wrt to scalar-multiple.
2016-06-14 15:06:35 +02:00
Gael Guennebaud
5d38203735
Update Tensor module to use bind1st_op and bind2nd_op
2016-06-14 15:06:03 +02:00
Gael Guennebaud
396d9cfb6e
Generalize expr.pow(scalar), pow(expr,scalar) and pow(scalar,expr).
...
Internal: scalar_pow_op (unary) is removed, and scalar_binary_pow_op is renamed scalar_pow_op.
2016-06-14 14:10:07 +02:00
Gael Guennebaud
a9bb653a68
Update doc (scalar_add_op is now deprecated)
2016-06-14 12:07:00 +02:00
Gael Guennebaud
a8c08e8b8e
Implement expr+scalar, scalar+expr, expr-scalar, and scalar-expr as binary expressions, and generalize supported scalar types.
...
The following functors are now deprecated: scalar_add_op, scalar_sub_op, and scalar_rsub_op.
2016-06-14 12:06:10 +02:00
Gael Guennebaud
756ac4a93d
Fix doc.
2016-06-14 12:03:39 +02:00
Gael Guennebaud
f925dba3d9
Fix compilation of BVH example
2016-06-14 11:32:09 +02:00
Gael Guennebaud
12350d3ac7
Add unit test for AlignedBox::center
2016-06-14 11:31:52 +02:00
Gael Guennebaud
bcc0f38f98
Add unittesting plugins to scalar_product_op and scalar_quotient_op to help chaking that types are properly propagated.
2016-06-14 11:31:27 +02:00
Gael Guennebaud
f57fd78e30
Generalize coeff-wise sparse products to support different scalar types
2016-06-14 11:29:54 +02:00
Gael Guennebaud
f5b1c73945
Set cost of constant expression to 0 (the cost should be amortized through the expression)
2016-06-14 11:29:06 +02:00
Gael Guennebaud
deb8306e60
Move MatrixBase::operaotr*(UniformScaling) as a free function in Scaling.h, and fix return type.
2016-06-14 11:28:03 +02:00
Gael Guennebaud
64fcfd314f
Implement scalar multiples and division by a scalar as a binary-expression with a constant expression.
...
This slightly complexifies the type of the expressions and implies that we now have to distinguish between scalar*expr and expr*scalar to catch scalar-multiple expression (e.g., see BlasUtil.h), but this brings several advantages:
- it makes it clear on each side the scalar is applied,
- it clearly reflects that we are dealing with a binary-expression,
- the complexity of the type is hidden through macros defined at the end of Macros.h,
- distinguishing between "scalar op expr" and "expr op scalar" is important to support non commutative fields (like quaternions)
- "scalar op expr" is now fully equivalent to "ConstantExpr(scalar) op expr"
- scalar_multiple_op, scalar_quotient1_op and scalar_quotient2_op are not used anymore in officially supported modules (still used in Tensor)
2016-06-14 11:26:57 +02:00
Gael Guennebaud
39781dc1e2
Fix compilation of evaluator unit test
2016-06-14 11:03:26 +02:00
Tal Hadad
6edfe8771b
Little bit docs
2016-06-13 22:03:19 +03:00
Tal Hadad
6e1c086593
Add static assertion
2016-06-13 21:55:17 +03:00
Gael Guennebaud
3c12e24164
Add bind1st_op and bind2nd_op helpers to turn binary functors into unary ones, and implement scalar_multiple2 and scalar_quotient2 on top of them.
2016-06-13 16:18:59 +02:00
Gael Guennebaud
7a9ef7bbb4
Add default template parameters for the second scalar type of binary functors.
...
This enhences backward compatibility.
2016-06-13 16:17:23 +02:00
Gael Guennebaud
2ca2ffb65e
check for mixing types in "array / scalar" expressions
2016-06-13 16:15:32 +02:00
Gael Guennebaud
4c61f00838
Add missing explicit scalar conversion
2016-06-12 22:42:13 +02:00
Tal Hadad
06206482d9
More docs, and minor code fixes
2016-06-12 23:40:17 +03:00
Gael Guennebaud
a3a4714aba
Add debug output.
2016-06-11 14:41:53 +02:00
Gael Guennebaud
83904a21c1
Make sure T(i+1,i)==0 when diagonalizing T(i:i+1,i:i+1)
2016-06-11 14:41:36 +02:00
Benoit Steiner
65d33e5898
Merged in ibab/eigen (pull request PR-195)
...
Add small fixes to TensorScanOp
2016-06-10 19:31:17 -07:00
Benoit Steiner
a05607875a
Don't refer to the half2 type unless it's been defined
2016-06-10 11:53:56 -07:00
Gael Guennebaud
fabae6c9a1
Cleanup
2016-06-10 15:58:33 +02:00
Gael Guennebaud
5de8d7036b
Add real.pow(complex), complex.pow(real) unit tests.
2016-06-10 15:58:22 +02:00
Gael Guennebaud
5fdd703629
Enable mixing types in numext::pow
2016-06-10 15:58:04 +02:00
Gael Guennebaud
2e238bafb6
Big 279: enable mixing types for comparisons, min, and max.
2016-06-10 15:05:43 +02:00
Gael Guennebaud
0028049380
bug #1240 : Remove any assumption on NEON vector types.
2016-06-09 23:08:11 +02:00
Igor Babuschkin
86aedc9282
Add small fixes to TensorScanOp
2016-06-07 20:06:38 +01:00
Christoph Hertzberg
db0118342c
Fixed compilation of BVH_Example (required for make doc)
2016-06-07 19:17:18 +02:00
Benoit Steiner
84b2060a9e
Fixed compilation error with gcc 4.4
2016-06-06 17:16:19 -07:00
Gael Guennebaud
2c462f4201
Clean handling for void type in EIGEN_CHECK_BINARY_COMPATIBILIY
2016-06-06 23:11:38 +02:00
Gael Guennebaud
3d71d3918e
Disable shortcuts for res ?= prod when the scalar types do not match exactly.
2016-06-06 23:10:55 +02:00
Benoit Steiner
7ef9f47b58
Misc small improvements to the reduction code.
2016-06-06 14:09:46 -07:00
Benoit Steiner
ea75dba201
Added missing EIGEN_DEVICE_FUNC qualifiers to the unary array ops
2016-06-06 13:32:28 -07:00
Benoit Steiner
33f0340188
Implement result_of for the new ternary functors
2016-06-06 12:06:42 -07:00
Tal Hadad
e30133e439
Doc EulerAngles class, and minor fixes.
2016-06-06 22:01:40 +03:00
Gael Guennebaud
df24f4a01d
bug #1201 : improve code generation of affine*vec with MSVC
2016-06-06 16:46:46 +02:00
Benoit Steiner
9137f560f0
Moved assertions to the constructor to make the code more portable
2016-06-06 07:26:48 -07:00
Gael Guennebaud
66e99ab6a1
Relax mixing-type constraints for binary coefficient-wise operators:
...
- Replace internal::scalar_product_traits<A,B> by Eigen::ScalarBinaryOpTraits<A,B,OP>
- Remove the "functor_is_product_like" helper (was pretty ugly)
- Currently, OP is not used, but it is available to the user for fine grained tuning
- Currently, only the following operators have been generalized: *,/,+,-,=,*=,/=,+=,-=
- TODO: generalize all other binray operators (comparisons,pow,etc.)
- TODO: handle "scalar op array" operators (currently only * is handled)
- TODO: move the handling of the "void" scalar type to ScalarBinaryOpTraits
2016-06-06 15:11:41 +02:00
Benoit Steiner
1f1e0b9e30
Silenced compilation warning
2016-06-05 12:59:11 -07:00
Benoit Steiner
5b95b4daf9
Moved static assertions into the class constructor to make the code more portable
2016-06-05 12:57:48 -07:00
Christoph Hertzberg
d7e3e4bb04
Removed executable bits from header files.
2016-06-05 10:15:41 +02:00
Eugene Brevdo
c53687dd14
Add randomized properties tests for betainc special function.
2016-06-05 11:10:30 -07:00
Rasmus Munk Larsen
f1f2ff8208
size_t -> int
2016-06-03 18:06:37 -07:00
Rasmus Munk Larsen
76308e7fd2
Add CurrentThreadId and NumThreads methods to Eigen threadpools and TensorDeviceThreadPool.
2016-06-03 16:28:58 -07:00
Sean Templeton
bd21243821
Fix compile errors initializing packets on ARM DS-5 5.20
...
The ARM DS-5 5.20 compiler fails compiling with the following errors:
"src/Core/arch/NEON/PacketMath.h", line 113: Error: #146 : too many initializer values
Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
^
"src/Core/arch/NEON/PacketMath.h", line 118: Error: #146 : too many initializer values
Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
^
"src/Core/arch/NEON/Complex.h", line 30: Error: #146 : too many initializer values
static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000);
^
"src/Core/arch/NEON/Complex.h", line 31: Error: #146 : too many initializer values
static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000);
^
The vectors are implemented as two doubles, hence the too many initializer values error.
Changed the code to use intrinsic load functions which all compilers
implementing NEON should have.
2016-06-03 10:51:35 -05:00
Gael Guennebaud
1fc2746417
Make Arrays's ctor/assignment noexcept
2016-06-09 22:52:37 +02:00
Benoit Steiner
37638dafd7
Simplified the code that dispatches vectorized reductions on GPU
2016-06-09 10:29:52 -07:00
Benoit Steiner
66796e843d
Fixed definition of some of the reducer_traits
2016-06-09 08:50:01 -07:00
Benoit Steiner
4434b16694
Pulled latest updates from trunk
2016-06-09 08:25:47 -07:00
Benoit Steiner
14a112ee15
Use signed integers more consistently to encode the number of threads to use to evaluate a tensor expression.
2016-06-09 08:25:22 -07:00
Benoit Steiner
8f92c26319
Improved code formatting
2016-06-09 08:23:42 -07:00
Benoit Steiner
aa33446dac
Improved support for vectorization of 16-bit floats
2016-06-09 08:22:27 -07:00
Gael Guennebaud
e2b3836326
Include recent changesets that played with product's kernel
2016-06-09 17:13:33 +02:00
Gael Guennebaud
2bd59b0e0d
Take advantage that T is already diagonal in the extraction of generalized complex eigenvalues.
2016-06-09 17:12:03 +02:00
Gael Guennebaud
c1f9ca9254
Update RealQZ to reduce 2x2 diagonal block of T corresponding to non reduced diagonal block of S to positive diagonal form.
...
This step involve a real 2x2 SVD problem. The respective routine is thus in src/misc/ to be shared by both EVD and AVD modules.
2016-06-09 17:11:03 +02:00
Gael Guennebaud
15890c304e
Add unit test for non symmetric generalized eigenvalues
2016-06-09 16:17:27 +02:00
Gael Guennebaud
a20d2ec1c0
Fix shadow variable, and indexing.
2016-06-09 16:16:22 +02:00
Abhijit Kundu
0beabb4776
Fixed type conversion from int
2016-06-08 16:12:04 -04:00
Gael Guennebaud
df095cab10
Fixes for PARDISO: warnings, and defaults to metis+ in-core mode.
2016-06-08 18:31:19 +02:00
Gael Guennebaud
9fc8379328
Fix extraction of complex eigenvalue pairs in real generalized eigenvalue problems.
2016-06-08 16:39:11 +02:00
Christoph Hertzberg
9dd9d58273
Copied a regression test from 3.2 branch.
2016-06-08 15:36:42 +02:00
Benoit Steiner
8fd57a97f2
Enable the vectorization of adds and mults of fp16
2016-06-07 18:22:18 -07:00
Benoit Steiner
d6d39c7ddb
Added missing EIGEN_DEVICE_FUNC
2016-06-07 14:35:08 -07:00
Gael Guennebaud
8d97ba6b22
bug #725 : make move ctor/assignment noexcept.
2016-06-03 14:28:25 +02:00
Gael Guennebaud
e8b922ca63
Fix MatrixFunctions module.
2016-06-03 09:21:35 +02:00
Gael Guennebaud
82293f38d6
Fix unit test.
2016-06-03 08:12:14 +02:00
Gael Guennebaud
fe62c06d9b
Fix compilation.
2016-06-03 07:47:38 +02:00
Gael Guennebaud
969b8959a0
Fix compilation: Matrix does not indirectly live in the internal namespace anymore!
2016-06-03 07:44:58 +02:00
Gael Guennebaud
f2c2465acc
Fix function dependencies
2016-06-03 07:44:18 +02:00
Benoit Steiner
c3c8ad8046
Align the first element of the Waiter struct instead of padding it. This reduces its memory footprint a bit while achieving the goal of preventing false sharing
2016-06-02 21:17:41 -07:00
Eugene Brevdo
39baff850c
Add TernaryFunctors and the betainc SpecialFunction.
...
TernaryFunctors and their executors allow operations on 3-tuples of inputs.
API fully implemented for Arrays and Tensors based on binary functors.
Ported the cephes betainc function (regularized incomplete beta
integral) to Eigen, with support for CPU and GPU, floats, doubles, and
half types.
Added unit tests in array.cpp and cxx11_tensor_cuda.cu
Collapsed revision
* Merged helper methods for betainc across floats and doubles.
* Added TensorGlobalFunctions with betainc(). Removed betainc() from TensorBase.
* Clean up CwiseTernaryOp checks, change igamma_helper to cephes_helper.
* betainc: merge incbcf and incbd into incbeta_cfe. and more cleanup.
* Update TernaryOp and SpecialFunctions (betainc) based on review comments.
2016-06-02 17:04:19 -07:00
Benoit Steiner
02db4e1a82
Disable the tensor tests when using msvc since older versions of the compiler fail to handle this code
2016-06-04 08:21:17 -07:00
Benoit Steiner
c21eaedce6
Use array_prod to compute the number of elements contained in the input tensor expression
2016-06-04 07:47:04 -07:00
Benoit Steiner
36a4500822
Merged in ibab/eigen (pull request PR-192)
...
Add generic scan method
2016-06-03 17:28:33 -07:00
Benoit Steiner
c2a102345f
Improved the performance of full reductions.
...
AFTER:
BM_fullReduction/10 4541 4543 154017 21.0M items/s
BM_fullReduction/64 5191 5193 100000 752.5M items/s
BM_fullReduction/512 9588 9588 71361 25.5G items/s
BM_fullReduction/4k 244314 244281 2863 64.0G items/s
BM_fullReduction/5k 359382 359363 1946 64.8G items/s
BEFORE:
BM_fullReduction/10 9085 9087 74395 10.5M items/s
BM_fullReduction/64 9478 9478 72014 412.1M items/s
BM_fullReduction/512 14643 14646 46902 16.7G items/s
BM_fullReduction/4k 260338 260384 2678 60.0G items/s
BM_fullReduction/5k 385076 385178 1818 60.5G items/s
2016-06-03 17:27:08 -07:00
Igor Babuschkin
dc03b8f3a1
Add generic scan method
2016-06-03 17:37:04 +01:00
Gael Guennebaud
5b77481d58
merge
2016-06-02 22:21:45 +02:00
Gael Guennebaud
53feb73b45
Remove dead code.
2016-06-02 22:19:55 +02:00
Gael Guennebaud
2c00ac0b53
Implement generic scalar*expr and expr*scalar operator based on scalar_product_traits.
...
This is especially useful for custom scalar types, e.g., to enable float*expr<multi_prec> without conversion.
2016-06-02 22:16:37 +02:00
Rasmus Munk Larsen
811aadbe00
Add syntactic sugar to Eigen tensors to allow more natural syntax.
...
Specifically, this enables expressions involving:
scalar + tensor
scalar * tensor
scalar / tensor
scalar - tensor
2016-06-02 12:41:28 -07:00
Tal Hadad
52e4cbf539
Merged eigen/eigen into default
2016-06-02 22:15:20 +03:00
Tal Hadad
2aaaf22623
Fix Gael reports (except documention)
...
- "Scalar angle(int) const" should be "const Vector& angles() const"
- then method "coeffs" could be removed.
- avoid one letter names like h, p, r -> use alpha(), beta(), gamma() ;)
- about the "fromRotation" methods:
- replace the ones which are not static by operator= (as in Quaternion)
- the others are actually static methods: use a capital F: FromRotation
- method "invert" should be removed.
- use a macro to define both float and double EulerAnglesXYZ* typedefs
- AddConstIf -> not used
- no needs for NegateIfXor, compilers are extremely good at optimizing away branches based on compile time constants:
if(IsHeadingOpposite-=IsEven) res.alpha() = -res.alpha();
2016-06-02 22:12:57 +03:00
Benoit Steiner
6021c90fdf
Merged in ibab/eigen (pull request PR-189)
...
Add scan op to Tensor module
2016-06-02 08:08:11 -07:00
Gael Guennebaud
8b6f53222b
bug #1193 : fix lpNorm<Infinity> for empty input.
2016-06-02 15:29:59 +02:00
Gael Guennebaud
d616a81294
Disable MSVC's "decorated name length exceeded, name was truncated" warning in unit tests.
2016-06-02 14:48:38 +02:00
Gael Guennebaud
61a32f2a4c
Fix pointer to long conversion warning.
2016-06-02 14:45:45 +02:00
Igor Babuschkin
fbd7ed6ff7
Add tensor scan op
...
This is the initial implementation a generic scan operation.
Based on this, cumsum and cumprod method have been added to TensorBase.
2016-06-02 13:35:47 +01:00
Benoit Steiner
0ed08fd281
Use a single PacketSize variable
2016-06-01 21:19:05 -07:00
Benoit Steiner
8f6fedc55f
Fixed compilation warning
2016-06-01 21:14:46 -07:00
Benoit Steiner
c3cada38e2
Speedup a test
2016-06-01 21:13:00 -07:00
Gael Guennebaud
360e311b66
Doc: add some cross references (also fix empty macro argument warning)
2016-06-01 23:34:09 +02:00
Benoit Steiner
873e6ac54b
Silenced compilation warning generated by nvcc.
2016-06-01 14:20:50 -07:00
Benoit Steiner
d27b0ad4c8
Added support for mean reductions on fp16
2016-06-01 11:12:07 -07:00
Gael Guennebaud
cd221a62ee
Doc: start of a table summarizing coefficient-wise math functions.
2016-06-01 17:09:48 +02:00
Gael Guennebaud
3c69afca4c
Add missing ArrayBase::log1p
2016-06-01 17:08:47 +02:00
Gael Guennebaud
89099b0cf7
Expose log1p to Array.
2016-06-01 17:00:08 +02:00
Gael Guennebaud
afd33539dd
Doc: makes the global unary math functions visible to doxygen (and docuement them)
2016-06-01 15:27:13 +02:00
Gael Guennebaud
77e652d8ad
Doc: improve documentation of Map<SparseMatrix>
2016-06-01 10:03:32 +02:00
Gael Guennebaud
da4970ead2
Doc: disable inlining of inherited members, workaround Doxygen's limited C++ parsing abilities, and improve doc of MapBase.
2016-06-01 09:38:49 +02:00
Benoit Steiner
099b354ca7
Pulled latest updates from trunk
2016-05-31 10:34:16 -07:00
Benoit Steiner
5aeb3687c4
Only enable optimized reductions of fp16 if the reduction functor supports them
2016-05-31 10:33:40 -07:00
Benoit Steiner
b6e306f189
Improved support for CUDA 8.0
2016-05-31 09:47:59 -07:00
Gael Guennebaud
1d3b253329
bug #1181 : help MSVC inlining.
2016-05-31 17:23:42 +02:00
Gael Guennebaud
d79eee05ef
Fix compilation with old icc
2016-05-31 17:13:51 +02:00
Gael Guennebaud
2c1b56f4c1
bug #1238 : fix SparseMatrix::sum() overload for un-compressed mode.
2016-05-31 10:56:53 +02:00
Benoit Steiner
c4bd3b1f21
Silenced some compilation warnings triggered by nvcc 8.0
2016-05-27 14:40:49 -07:00
Benoit Steiner
e2946d962d
Reimplement clamp as a static function.
2016-05-27 12:58:43 -07:00
Benoit Steiner
e96d36d4cd
Use NULL instead of nullptr to preserve the compatibility with cxx03
2016-05-27 12:54:06 -07:00
Benoit Steiner
abc815798b
Added a new operation to enable more powerful tensorindexing.
2016-05-27 12:22:25 -07:00
Benoit Steiner
5707537592
Fixed option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr' warning generated by nvcc 7.5
2016-05-27 10:47:53 -07:00
Benoit Steiner
3a5d6a3c38
Disable the use of MMX instructions since the code is broken on many platforms
2016-05-27 09:13:26 -07:00
Christoph Hertzberg
f2c86384f4
Cleaner implementation of dont_over_optimize.
2016-05-27 11:13:38 +02:00
Gael Guennebaud
22a035db95
Fix compilation when defaulting to row-major
2016-05-27 10:31:11 +02:00
Gael Guennebaud
e0cb73b46b
Fix compilation with old ICC version (use C99 types instead of C++11 ones)
2016-05-27 10:28:09 +02:00
Benoit Steiner
1ae2567861
Fixed some compilation warnings
2016-05-26 15:57:19 -07:00
Benoit Steiner
094f4a56c8
Deleted extra namespace
2016-05-26 14:49:51 -07:00
Benoit Steiner
1a47844529
Preserve the ability to vectorize the evaluation of an expression even when it involves a cast that isn't vectorized (e.g fp16 to float)
2016-05-26 14:37:09 -07:00
Benoit Steiner
36369ab63c
Resolved merge conflicts
2016-05-26 13:39:39 -07:00
Benoit Steiner
28fcb5ca2a
Merged latest reduction improvements
2016-05-26 12:19:33 -07:00
Benoit Steiner
b24cf21235
Merged latest code improvements
2016-05-26 11:57:50 -07:00
Benoit Steiner
c1c7f06c35
Improved the performance of inner reductions.
2016-05-26 11:53:59 -07:00
Benoit Steiner
22d02c9855
Improved the coverage of the fp16 reduction tests
2016-05-26 11:12:16 -07:00
Christoph Hertzberg
41dcd047d7
bug #1237 : Redefine eigen_assert instead of disabling assertions for documentation snippets
2016-05-26 18:13:33 +02:00
Benoit Steiner
8288b0aec2
Code cleanup.
2016-05-26 09:00:04 -07:00
Gael Guennebaud
7ff5fadcc0
Disable usage of MMX with msvc.
2016-05-26 17:58:46 +02:00
Gael Guennebaud
e8cef383b7
bug #1236 : fix possible integer overflow in density estimation.
2016-05-26 17:51:04 +02:00
Gael Guennebaud
35df3a32eb
Disabled GCC6's ignored-attributes warning in packetmath unit test.
2016-05-26 17:42:58 +02:00
Gael Guennebaud
db62719eda
Fix some conversion warnings in unit tests.
2016-05-26 17:42:12 +02:00
Gael Guennebaud
fdcad686ee
Fix numerous pointer-to-integer conversion warnings in unit tests.
2016-05-26 17:41:28 +02:00
Gael Guennebaud
30d97c03ce
Defer the allocation of the working space:
...
- it is not always needed,
- and this fixes a long-to-float conversion warning
2016-05-26 17:39:42 +02:00
Gael Guennebaud
e08f54e9eb
Fix copy ctor prototype.
2016-05-26 17:37:25 +02:00
Gael Guennebaud
c7f54b11ec
linspaced's divisor for integer is better stored as the underlying scalar type.
2016-05-26 17:36:54 +02:00
Gael Guennebaud
bebc5a2147
Fix/handle some int-to-long conversions.
2016-05-26 17:35:53 +02:00
Gael Guennebaud
00c29c2cae
Store permutation's determinant as char.
...
This also fixes some long to float conversion warnings
2016-05-26 17:34:23 +02:00
Gael Guennebaud
2f56d91063
Fix a pointer to integer conversion warning
2016-05-26 17:31:45 +02:00
Gael Guennebaud
2a44a70142
Handle some Index to int conversions in BLAS/LAPACK support.
2016-05-26 17:29:04 +02:00
Gael Guennebaud
f253e19296
Disable some long to float conversion warnings
2016-05-26 17:27:14 +02:00
Christoph Hertzberg
2ee306e44a
Temporary workaround for bug #1237 . The snippet (expectedly) failed with enabled assertions.
2016-05-26 16:16:41 +02:00
Gael Guennebaud
37197b602b
Remove debuging code.
2016-05-26 11:53:10 +02:00
Gael Guennebaud
27f0434233
Introduce internal's UIntPtr and IntPtr types for pointer to integer conversions.
...
This fixes "conversion from pointer to same-sized integral type" warnings by ICC.
Ideally, we would use the std::[u]intptr_t types all the time, but since they are C99/C++11 only,
let's be safe.
2016-05-26 10:52:12 +02:00
Gael Guennebaud
40e4637d79
Turn off ICC's conversion warning in is_convertible implementation
2016-05-26 10:48:43 +02:00
Gael Guennebaud
cc1ab64f29
Add missing inclusion of mmintrin.h
2016-05-26 09:51:50 +02:00
Benoit Steiner
2d7ed54ba2
Made the static storage class qualifier come first.
2016-05-25 22:16:15 -07:00
Benoit Steiner
e1fca8866e
Deleted unnecessary explicit qualifiers.
2016-05-25 22:15:26 -07:00
Benoit Steiner
9b0aaf5113
Don't mark inline functions as static since it confuses the ICC compiler
2016-05-25 22:10:11 -07:00
Benoit Steiner
3585ff585e
Silenced a compilation warning
2016-05-25 22:09:19 -07:00
Benoit Steiner
037a463fd5
Marked unused variables as such
2016-05-25 22:07:48 -07:00
Benoit Steiner
efeb89dcdb
Specify the rounding mode in the correct location
2016-05-25 17:53:24 -07:00
Benoit Steiner
457204cb83
Updated the README file for the tensor benchmarks
2016-05-25 16:13:41 -07:00
Benoit Steiner
0322c66a3f
Explicitly specify the rounding mode when converting floats to fp16
2016-05-25 15:56:15 -07:00
Benoit Steiner
3ac4045272
Made the IndexPair code compile in non cxx11 mode
2016-05-25 15:15:12 -07:00
Benoit Steiner
66556d0e05
Made the index pair list code more portable accross various compilers
2016-05-25 14:34:27 -07:00
Benoit Steiner
034aa3b2c0
Improved the performance of tensor padding
2016-05-25 11:43:08 -07:00
Benoit Steiner
58026905ae
Added support for statically known lists of pairs of indices
2016-05-25 11:04:14 -07:00
Benoit Steiner
ed783872ab
Disable the use of MMX instructions on x86_64 since too many compilers only support them in 32bit mode
2016-05-25 08:27:26 -07:00
Benoit Steiner
bcfff64f9e
Use numext:: instead of std:: functions.
2016-05-25 08:08:21 -07:00
Gael Guennebaud
f57260a997
Fix typo in dont_over_optimize
2016-05-25 11:17:53 +02:00
Gael Guennebaud
2cd32be70b
Fix warning.
2016-05-25 11:15:54 +02:00
Gael Guennebaud
bbf9109e25
Fix compilation with ICC.
2016-05-25 10:00:55 +02:00
Gael Guennebaud
2a1bff67fd
Fix static/inline order.
2016-05-25 10:00:11 +02:00
Benoit Steiner
0835667329
There is no need to make the fp16 full reduction kernel a static function.
2016-05-24 23:11:56 -07:00
Benoit Steiner
b5d6b52a4d
Fixed compilation warning
2016-05-24 23:10:57 -07:00
Benoit Steiner
d041a528da
Cleaned up the fp16 code a little more
2016-05-24 22:43:26 -07:00
Benoit Steiner
cb26784d07
Pulled latest updates from trunk
2016-05-24 18:51:39 -07:00
Benoit Steiner
ff4a289572
Cleaned up the fp16 code
2016-05-24 18:50:09 -07:00
Gael Guennebaud
3f715e1701
update doc wrt to unaligned vectorization
2016-05-24 22:34:59 +02:00
Gael Guennebaud
9216abe28d
Document EIGEN_UNALIGNED_VECTORIZE.
2016-05-24 22:14:34 +02:00
Gael Guennebaud
0fd953c217
Workaround clang/llvm bug in code generation.
2016-05-24 21:55:46 +02:00
Gael Guennebaud
e68e165a23
bug #256 : enable vectorization with unaligned loads/stores.
...
This concerns all architectures and all sizes.
This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0
2016-05-24 21:54:03 +02:00
Gael Guennebaud
78390e4189
Block<> should not disable vectorization based on inner-size, this is the responsibilty of the assignment logic.
2016-05-24 17:14:01 +02:00
Gael Guennebaud
64bb7576eb
Clean propagation of Dest/Src alignments.
2016-05-24 17:12:12 +02:00
Benoit Jacob
40a16282c7
Remove now-unused protate PacketMath func
2016-05-24 11:01:18 -04:00
Benoit Jacob
6136f4fdd4
Remove the rotating kernel. It was only useful on some ARM CPUs (Qualcomm Krait) that are not as ubiquitous today as they were when I introduced it.
2016-05-24 10:00:32 -04:00
Benoit Steiner
e617711306
Don't attempt to use MMX instructions with visualstudio since they're only partially supported.
2016-05-24 06:43:58 -07:00
Benoit Steiner
334e76537f
Worked around missing clang intrinsic
2016-05-24 00:29:28 -07:00
Benoit Steiner
b517ab349b
Use the generic ploadquad intrinsics since it does the job
2016-05-24 00:11:17 -07:00
Benoit Steiner
646872cb3b
Worked around missing clang intrinsics
2016-05-24 00:07:08 -07:00
Benoit Steiner
3dfc391a61
Added missing EIGEN_DEVICE_FUNC qualifier
2016-05-23 20:56:59 -07:00
Benoit Steiner
3d0741f027
Include mmintrin.h to make it possible to use mmx instructions when needed. For example, this will enable the definition of a half packet for the Packet4f type.
2016-05-23 20:43:48 -07:00
Benoit Steiner
33a94f5dc7
Use the Index type instead of integers to specify the strides in pgather/pscatter
2016-05-23 20:37:30 -07:00
Benoit Steiner
6bc684ab6a
Added missing alignment in the fp16 packet traits
2016-05-23 20:32:30 -07:00
Benoit Steiner
283e33dea4
ptranspose is not a template.
2016-05-23 19:55:55 -07:00
Benoit Steiner
a5a3ba2b80
Avoid unnecessary float to double conversions
2016-05-23 17:16:09 -07:00
Benoit Steiner
5ba0ebe7c9
Avoid unnecessary float to double conversion.
2016-05-23 17:14:31 -07:00
Benoit Steiner
7d980d74e5
Started to vectorize the processing of 16bit floats on CPU.
2016-05-23 15:21:40 -07:00
Benoit Steiner
5d51a7f12c
Don't optimize the processing of the last rows of a matrix matrix product in cases that violate the assumptions made by the optimized code path.
2016-05-23 15:13:16 -07:00
Benoit Steiner
7aa5bc9558
Fixed a typo in the array.cpp test
2016-05-23 14:39:51 -07:00
Benoit Steiner
a09cbf9905
Merged in rmlarsen/eigen (pull request PR-188)
...
Minor cleanups: 1. Get rid of a few unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-23 12:55:12 -07:00
Christoph Hertzberg
88654762da
Replace multiple constructors of half-type by a generic/templated constructor. This fixes an incompatibility with long double, exposed by the previous commit.
2016-05-23 10:03:03 +02:00
Christoph Hertzberg
718521d5cf
Silenced several double-promotion warnings
2016-05-22 18:17:04 +02:00
Christoph Hertzberg
b5a7603822
fixed macro name
2016-05-22 16:49:29 +02:00
Christoph Hertzberg
25a03c02d6
Fix some sign-compare warnings
2016-05-22 16:42:27 +02:00
Christoph Hertzberg
0851d5d210
Identify clang++ even if it is not named llvm-clang++
2016-05-22 15:21:14 +02:00
Gael Guennebaud
6a15e14cda
Document EIGEN_MAX_CPP_VER and user controllable compiler features.
2016-05-20 15:26:09 +02:00
Gael Guennebaud
ccaace03c9
Make EIGEN_HAS_CONSTEXPR user configurable
2016-05-20 15:10:08 +02:00
Gael Guennebaud
c3410804cd
Make EIGEN_HAS_VARIADIC_TEMPLATES user configurable
2016-05-20 15:05:38 +02:00
Gael Guennebaud
abd1c1af7a
Make EIGEN_HAS_STD_RESULT_OF user configurable
2016-05-20 15:01:27 +02:00
Gael Guennebaud
1395056fc0
Make EIGEN_HAS_C99_MATH user configurable
2016-05-20 14:58:19 +02:00
Gael Guennebaud
48bf5ec216
Make EIGEN_HAS_RVALUE_REFERENCES user configurable
2016-05-20 14:54:20 +02:00
Gael Guennebaud
f43ae88892
Rename EIGEN_HAVE_RVALUE_REFERENCES to EIGEN_HAS_RVALUE_REFERENCES
2016-05-20 14:48:51 +02:00
Gael Guennebaud
8d6bd5691b
polygamma is C99/C++11 only
2016-05-20 14:45:33 +02:00
Gael Guennebaud
998f2efc58
Add a EIGEN_MAX_CPP_VER option to limit the C++ version to be used.
2016-05-20 14:44:28 +02:00
Gael Guennebaud
c028d96089
Improve doc of special math functions
2016-05-20 14:18:48 +02:00
Gael Guennebaud
0ba32f99bd
Rename UniformRandom to UnitRandom.
2016-05-20 13:21:34 +02:00
Gael Guennebaud
7a9d9cde94
Fix coding practice in Quaternion::UniformRandom
2016-05-20 13:19:52 +02:00
Joseph Mirabel
eb0cc2573a
bug #823 : add static method to Quaternion for uniform random rotations.
2016-05-20 13:15:40 +02:00
Gael Guennebaud
2f656ce447
Remove std:: to enable custom scalar types.
2016-05-19 23:13:47 +02:00
Rasmus Larsen
b1e080c752
Merged eigen/eigen into default
2016-05-18 15:21:50 -07:00
Rasmus Munk Larsen
5624219b6b
Merge.
2016-05-18 15:16:06 -07:00
Rasmus Munk Larsen
7df811cfe5
Minor cleanups: 1. Get rid of unused variables. 2. Get rid of last uses of EIGEN_USE_COST_MODEL.
2016-05-18 15:09:48 -07:00
Benoit Steiner
bb3ff8e9d9
Advertize the packet api of the tensor reducers iff the corresponding packet primitives are available.
2016-05-18 14:52:49 -07:00
Gael Guennebaud
84df9142e7
bug #1231 : fix compilation regression regarding complex_array/=real_array and add respective unit tests
2016-05-18 23:00:13 +02:00
Gael Guennebaud
21d692d054
Use coeff(i,j) instead of operator().
2016-05-18 17:09:20 +02:00
Gael Guennebaud
8456bbbadb
bug #1224 : fix regression in (dense*dense).sparseView() by specializing evaluator<SparseView<Product>> for sparse products only.
2016-05-18 16:53:28 +02:00
Gael Guennebaud
b507b82326
Use default sorting strategy for square products.
2016-05-18 16:51:54 +02:00
Gael Guennebaud
1fa15ceee6
Extend sparse*sparse product unit test to check that the expected implementation is used (conservative vs auto pruning).
2016-05-18 16:50:54 +02:00
Gael Guennebaud
548a487800
bug #1229 : bypass usage of Derived::Options which is available for plain matrix types only. Better use column-major storage anyway.
2016-05-18 16:44:05 +02:00
Gael Guennebaud
43790e009b
Pass argument by const ref instead of by value in pow(AutoDiffScalar...)
2016-05-18 16:28:02 +02:00
Gael Guennebaud
1fbfab27a9
bug #1223 : fix compilation of AutoDiffScalar's min/max operators, and add regression unit test.
2016-05-18 16:26:26 +02:00
Gael Guennebaud
448d9d943c
bug #1222 : fix compilation in AutoDiffScalar and add respective unit test
2016-05-18 16:00:11 +02:00
Gael Guennebaud
5a71eb5985
Big 1213: add regression unit test.
2016-05-18 14:03:03 +02:00
Gael Guennebaud
747e3290c0
bug #1213 : rename some enums type for consistency.
2016-05-18 13:26:56 +02:00
Rasmus Munk Larsen
f519fca72b
Reduce overhead for small tensors and cheap ops by short-circuiting the const computation and block size calculation in parallelFor.
2016-05-17 16:06:00 -07:00
Benoit Steiner
86ae94462e
#if defined(EIGEN_USE_NONBLOCKING_THREAD_POOL) is now #if !defined(EIGEN_USE_SIMPLE_THREAD_POOL): the non blocking thread pool is the default since it's more scalable, and one needs to request the old thread pool explicitly.
2016-05-17 14:06:15 -07:00
Benoit Steiner
997c335970
Fixed compilation error
2016-05-17 12:54:18 -07:00
Benoit Steiner
ebf6ada5ee
Fixed compilation error in the tensor thread pool
2016-05-17 12:33:46 -07:00
Rasmus Munk Larsen
0bb61b04ca
Merge upstream.
2016-05-17 10:26:10 -07:00
Rasmus Munk Larsen
0dbd68145f
Roll back changes to core. Move include of TensorFunctors.h up to satisfy dependence in TensorCostModel.h.
2016-05-17 10:25:19 -07:00
Rasmus Larsen
00228f2506
Merged eigen/eigen into default
2016-05-17 09:49:31 -07:00
Benoit Steiner
e7e64c3277
Enable the use of the packet api to evaluate tensor broadcasts. This speed things up quite a bit:
...
Before"
M_broadcasting/10 500000 3690 27.10 MFlops/s
BM_broadcasting/80 500000 4014 1594.24 MFlops/s
BM_broadcasting/640 100000 14770 27731.35 MFlops/s
BM_broadcasting/4K 5000 632711 39512.48 MFlops/s
After:
BM_broadcasting/10 500000 4287 23.33 MFlops/s
BM_broadcasting/80 500000 4455 1436.41 MFlops/s
BM_broadcasting/640 200000 10195 40173.01 MFlops/s
BM_broadcasting/4K 5000 423746 58997.57 MFlops/s
2016-05-17 09:24:35 -07:00
Benoit Steiner
5fa27574dd
Allow vectorized padding on GPU. This helps speed things up a little
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:17:26 -07:00
Benoit Steiner
a910bcee43
Merged latest updates from trunk
2016-05-17 09:14:22 -07:00
Benoit Steiner
8d06c02ffd
Allow vectorized padding on GPU. This helps speed things up a little.
...
Before:
BM_padding/10 5000000 460 217.03 MFlops/s
BM_padding/80 5000000 460 13899.40 MFlops/s
BM_padding/640 5000000 461 888421.17 MFlops/s
BM_padding/4K 5000000 460 54316322.55 MFlops/s
After:
BM_padding/10 5000000 454 220.20 MFlops/s
BM_padding/80 5000000 455 14039.86 MFlops/s
BM_padding/640 5000000 452 904968.83 MFlops/s
BM_padding/4K 5000000 411 60750049.21 MFlops/s
2016-05-17 09:13:27 -07:00
Benoit Steiner
86da77cb9b
Pulled latest updates from trunk.
2016-05-17 07:21:48 -07:00
Benoit Steiner
92fc6add43
Don't rely on c++11 extension when we don't have to.
2016-05-17 07:21:22 -07:00
Benoit Steiner
2d74ef9682
Avoid float to double conversion
2016-05-17 07:20:11 -07:00
David Dement
ccc7563ac5
made a fix to the GMRES solver so that it now correctly reports the error achieved in the solution process
2016-05-16 14:26:41 -04:00
Gael Guennebaud
575bc44c3f
Fix unit test.
2016-05-19 22:48:16 +02:00
Gael Guennebaud
ccb408ee6a
Improve unit tests of zeta, polygamma, and digamma
2016-05-19 18:34:41 +02:00
Gael Guennebaud
6761c64d60
zeta and polygamma are not unary functions, but binary ones.
2016-05-19 18:34:16 +02:00
Gael Guennebaud
7a54032408
zeta and digamma do not require C++11/C99
2016-05-19 17:36:47 +02:00
Gael Guennebaud
ce12562710
Add some c++11 flags in documentation
2016-05-19 17:35:30 +02:00
Gael Guennebaud
b6ed8244b4
bug #1201 : optimize affine*vector products
2016-05-19 16:09:15 +02:00
Gael Guennebaud
73693b5de6
bug #1221 : disable gcc 6 warning: ignoring attributes on template argument
2016-05-19 15:21:53 +02:00
Gael Guennebaud
df9a5e13c6
Fix SelfAdjointEigenSolver for some input expression types, and add new regression unit tests for sparse and selfadjointview inputs.
2016-05-19 13:07:33 +02:00
Gael Guennebaud
6a2916df80
DiagonalWrapper is a vector, so it must expose the LinearAccessBit flag.
2016-05-19 13:06:21 +02:00
Gael Guennebaud
a226f6af6b
Add support for SelfAdjointView::diagonal()
2016-05-19 13:05:33 +02:00
Gael Guennebaud
ee7da3c7c5
Fix SelfAdjointView::triangularView for complexes.
2016-05-19 13:01:51 +02:00
Gael Guennebaud
b6b8578a67
bug #1230 : add support for SelfadjointView::triangularView.
2016-05-19 11:36:38 +02:00
Benoit Steiner
a80d875916
Added missing costPerCoeff method
2016-05-16 09:31:10 -07:00
Benoit Steiner
83ef39e055
Turn on the cost model by default. This results in some significant speedups for smaller tensors. For example, below are the results for the various tensor reductions.
...
Before:
BM_colReduction_12T/10 1000000 1949 51.29 MFlops/s
BM_colReduction_12T/80 100000 15636 409.29 MFlops/s
BM_colReduction_12T/640 20000 95100 4307.01 MFlops/s
BM_colReduction_12T/4K 500 4573423 5466.36 MFlops/s
BM_colReduction_4T/10 1000000 1867 53.56 MFlops/s
BM_colReduction_4T/80 500000 5288 1210.11 MFlops/s
BM_colReduction_4T/640 10000 106924 3830.75 MFlops/s
BM_colReduction_4T/4K 500 9946374 2513.48 MFlops/s
BM_colReduction_8T/10 1000000 1912 52.30 MFlops/s
BM_colReduction_8T/80 200000 8354 766.09 MFlops/s
BM_colReduction_8T/640 20000 85063 4815.22 MFlops/s
BM_colReduction_8T/4K 500 5445216 4591.19 MFlops/s
BM_rowReduction_12T/10 1000000 2041 48.99 MFlops/s
BM_rowReduction_12T/80 100000 15426 414.87 MFlops/s
BM_rowReduction_12T/640 50000 39117 10470.98 MFlops/s
BM_rowReduction_12T/4K 500 3034298 8239.14 MFlops/s
BM_rowReduction_4T/10 1000000 1834 54.51 MFlops/s
BM_rowReduction_4T/80 500000 5406 1183.81 MFlops/s
BM_rowReduction_4T/640 50000 35017 11697.16 MFlops/s
BM_rowReduction_4T/4K 500 3428527 7291.76 MFlops/s
BM_rowReduction_8T/10 1000000 1925 51.95 MFlops/s
BM_rowReduction_8T/80 200000 8519 751.23 MFlops/s
BM_rowReduction_8T/640 50000 33441 12248.42 MFlops/s
BM_rowReduction_8T/4K 1000 2852841 8763.19 MFlops/s
After:
BM_colReduction_12T/10 50000000 59 1678.30 MFlops/s
BM_colReduction_12T/80 5000000 725 8822.71 MFlops/s
BM_colReduction_12T/640 20000 90882 4506.93 MFlops/s
BM_colReduction_12T/4K 500 4668855 5354.63 MFlops/s
BM_colReduction_4T/10 50000000 59 1687.37 MFlops/s
BM_colReduction_4T/80 5000000 737 8681.24 MFlops/s
BM_colReduction_4T/640 50000 108637 3770.34 MFlops/s
BM_colReduction_4T/4K 500 7912954 3159.38 MFlops/s
BM_colReduction_8T/10 50000000 60 1657.21 MFlops/s
BM_colReduction_8T/80 5000000 726 8812.48 MFlops/s
BM_colReduction_8T/640 20000 91451 4478.90 MFlops/s
BM_colReduction_8T/4K 500 5441692 4594.16 MFlops/s
BM_rowReduction_12T/10 20000000 93 1065.28 MFlops/s
BM_rowReduction_12T/80 2000000 950 6730.96 MFlops/s
BM_rowReduction_12T/640 50000 38196 10723.48 MFlops/s
BM_rowReduction_12T/4K 500 3019217 8280.29 MFlops/s
BM_rowReduction_4T/10 20000000 93 1064.30 MFlops/s
BM_rowReduction_4T/80 2000000 959 6667.71 MFlops/s
BM_rowReduction_4T/640 50000 37433 10941.96 MFlops/s
BM_rowReduction_4T/4K 500 3036476 8233.23 MFlops/s
BM_rowReduction_8T/10 20000000 93 1072.47 MFlops/s
BM_rowReduction_8T/80 2000000 959 6670.04 MFlops/s
BM_rowReduction_8T/640 50000 38069 10759.37 MFlops/s
BM_rowReduction_8T/4K 1000 2758988 9061.29 MFlops/s
2016-05-16 08:55:21 -07:00
Benoit Steiner
b789a26804
Fixed syntax error
2016-05-16 08:51:08 -07:00
Benoit Steiner
83dfb40f66
Turnon the new thread pool by default since it scales much better over multiple cores. It is still possible to revert to the old thread pool by compiling with the EIGEN_USE_SIMPLE_THREAD_POOL define.
2016-05-13 17:23:15 -07:00
Benoit Steiner
97605c7b27
New multithreaded contraction that doesn't rely on the thread pool to run the closure in the order in which they are enqueued. This is needed in order to switch to the new non blocking thread pool since this new thread pool can execute the closure in any order.
2016-05-13 17:11:29 -07:00
Benoit Steiner
069a0b04d7
Added benchmarks for contraction on CPU.
2016-05-13 14:32:17 -07:00
Benoit Steiner
c4fc8b70ec
Removed unnecessary thread synchronization
2016-05-13 10:49:38 -07:00
Benoit Steiner
7aa3557d31
Fixed compilation errors triggered by old versions of gcc
2016-05-12 18:59:04 -07:00
Rasmus Munk Larsen
5005b27fc8
Diasbled cost model by accident. Revert.
2016-05-12 16:55:21 -07:00
Rasmus Munk Larsen
989e419328
Address comments by bsteiner.
2016-05-12 16:54:19 -07:00
Rasmus Munk Larsen
e55deb21c5
Improvements to parallelFor.
...
Move some scalar functors from TensorFunctors. to Eigen core.
2016-05-12 14:07:22 -07:00
Benoit Steiner
ae9688f313
Worked around a compilation error triggered by nvcc when compiling a tensor concatenation kernel.
2016-05-12 12:06:51 -07:00
Benoit Steiner
2a54b70d45
Fixed potential race condition in the non blocking thread pool
2016-05-12 11:45:48 -07:00
Benoit Steiner
a071629fec
Replace implicit cast with an explicit one
2016-05-12 10:40:07 -07:00
Benoit Steiner
2f9401b061
Worked around compilation errors with older versions of gcc
2016-05-11 23:39:20 -07:00
Benoit Steiner
09653e1f82
Improved the portability of the tensor code
2016-05-11 23:29:09 -07:00
Benoit Steiner
fae0493f98
Fixed a couple of bugs related to the Pascalfamily of GPUs
...
H: Enter commit message. Lines beginning with 'HG:' are removed.
2016-05-11 23:02:26 -07:00
Benoit Steiner
886445ce4d
Avoid unnecessary conversions between floats and doubles
2016-05-11 23:00:03 -07:00
Benoit Steiner
595e890391
Added more tests for half floats
2016-05-11 21:27:15 -07:00
Benoit Steiner
b6a517c47d
Added the ability to load fp16 using the texture path.
...
Improved the performance of some reductions on fp16
2016-05-11 21:26:48 -07:00
Benoit Steiner
518149e868
Misc fixes for fp16
2016-05-11 20:11:14 -07:00
Benoit Steiner
56a1757d74
Made predux_min and predux_max on fp16 less noisy
2016-05-11 17:37:34 -07:00
Benoit Steiner
9091351dbe
__ldg is only available with cuda architectures >= 3.5
2016-05-11 15:22:13 -07:00
Benoit Steiner
02f76dae2d
Fixed a typo
2016-05-11 15:08:38 -07:00
Christoph Hertzberg
131e5a1a4a
Do not copy for trivial 1x1 case. This also avoids a "maybe-uninitialized" warning in some situations.
2016-05-11 23:50:13 +02:00
Benoit Steiner
70195a5ff7
Added missing EIGEN_DEVICE_FUNC
2016-05-11 14:10:09 -07:00
Benoit Steiner
09a19c33a8
Added missing EIGEN_DEVICE_FUNC qualifiers
2016-05-11 14:07:43 -07:00
Christoph Hertzberg
1a1ce6ff61
Removed deprecated flag (which apparently was ignored anyway)
2016-05-11 23:05:37 +02:00
Christoph Hertzberg
2150f13d65
fixed some double-promotion and sign-compare warnings
2016-05-11 23:02:26 +02:00
Christoph Hertzberg
7268b10203
Split unit test
2016-05-11 19:41:53 +02:00
Christoph Hertzberg
8d4ef391b0
Don't flood test output with successful VERIFY_IS_NOT_EQUAL tests.
2016-05-11 19:40:45 +02:00
Christoph Hertzberg
bda21407dd
Fix help output of buildtests and check scripts
2016-05-11 19:39:09 +02:00
Christoph Hertzberg
33ca7e3c8d
bug #1207 : Add and fix logical-op warnings
2016-05-11 19:36:34 +02:00
Benoit Steiner
217d984abc
Fixed a typo in my previous commit
2016-05-11 10:22:15 -07:00
Benoit Steiner
08348b4e48
Fix potential race condition in the CUDA reduction code.
2016-05-11 10:08:51 -07:00
Benoit Steiner
cbb14ed47e
Added a few tests to validate the generation of random tensors on GPU.
2016-05-11 10:05:56 -07:00
Benoit Steiner
6a5717dc74
Explicitely initialize all the atomic variables.
2016-05-11 10:04:41 -07:00
Christoph Hertzberg
0f61343893
Workaround maybe-uninitialized warning
2016-05-11 09:00:18 +02:00
Christoph Hertzberg
3bfc9b47ca
Workaround "misleading-indentation" warnings
2016-05-11 08:41:36 +02:00
Benoit Steiner
4ede059de1
Properly gate the use of half2.
2016-05-10 17:04:01 -07:00
Benoit Steiner
bf185c3c28
Extended the tests for ptanh
2016-05-10 16:21:43 -07:00
Benoit Steiner
661e710092
Added support for fp16 to the sigmoid functor.
2016-05-10 12:25:27 -07:00
Benoit Steiner
0eb69b7552
Small improvement to the full reduction of fp16
2016-05-10 11:58:18 -07:00
Benoit Steiner
0b9e3dcd06
Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.
2016-05-10 11:05:33 -07:00
Benoit Steiner
6bf8273bc0
Added a test to validate the new non blocking thread pool
2016-05-10 10:49:34 -07:00
Benoit Steiner
4013b8feca
Simplified the reduction code a little.
2016-05-10 09:40:42 -07:00
Benoit Steiner
75bd2bd32d
Fixed compilation warning
2016-05-09 19:24:41 -07:00
Benoit Steiner
4670d7d5ce
Improved the performance of full reductions on GPU:
...
Before:
BM_fullReduction/10 200000 11751 8.51 MFlops/s
BM_fullReduction/80 5000 523385 12.23 MFlops/s
BM_fullReduction/640 50 36179326 11.32 MFlops/s
BM_fullReduction/4K 1 2173517195 11.50 MFlops/s
After:
BM_fullReduction/10 500000 5987 16.70 MFlops/s
BM_fullReduction/80 200000 10636 601.73 MFlops/s
BM_fullReduction/640 50000 58428 7010.31 MFlops/s
BM_fullReduction/4K 1000 2006106 12461.95 MFlops/s
2016-05-09 17:09:54 -07:00
Benoit Steiner
c3859a2b58
Added the ability to use a scratch buffer in cuda kernels
2016-05-09 17:05:53 -07:00
Benoit Steiner
ba95e43ea2
Added a new parallelFor api to the thread pool device.
2016-05-09 10:45:12 -07:00
Benoit Steiner
dc7dbc2df7
Optimized the non blocking thread pool:
...
* Use a pseudo-random permutation of queue indices during random stealing. This ensures that all the queues are considered.
* Directly pop from a non-empty queue when we are waiting for work,
instead of first noticing that there is a non-empty queue and
then doing another round of random stealing to re-discover the non-empty
queue.
* Steal only 1 task from a remote queue instead of half of tasks.
2016-05-09 10:17:17 -07:00
Benoit Steiner
05c365fb16
Pulled latest updates from trunk
2016-05-07 13:39:04 -07:00
Benoit Steiner
691614bd2c
Worked around a bug in nvcc on tegra x1
2016-05-07 13:28:53 -07:00
Benoit Steiner
a2d94fc216
Merged latest updates from trunk
2016-05-06 19:17:57 -07:00
Benoit Steiner
8adf5cc70f
Added support for packet processing of fp16 on kepler and maxwell gpus
2016-05-06 19:16:43 -07:00
Benoit Steiner
1660e749b4
Avoid double promotion
2016-05-06 08:15:12 -07:00
Christoph Hertzberg
a11bd82dc3
bug #1213 : Give names to anonymous enums
2016-05-06 11:31:56 +02:00
Benoit Steiner
c54ae65c83
Marked a few tensor operations as read only
2016-05-05 17:18:47 -07:00
Benoit Steiner
69a8a4e1f3
Added a test to validate full reduction on tensor of half floats
2016-05-05 16:52:50 -07:00
Benoit Steiner
678a17ba79
Made the testing of contractions on fp16 more robust
2016-05-05 16:36:39 -07:00
Benoit Steiner
e3d053e14e
Refined the testing of log and exp on fp16
2016-05-05 16:24:15 -07:00
Benoit Steiner
9a48688d37
Further improved the testing of fp16
2016-05-05 15:58:05 -07:00
Benoit Steiner
0451940fa4
Relaxed the dummy precision for fp16
2016-05-05 15:40:01 -07:00
Benoit Steiner
910e013506
Relaxed an assertion that was tighter that necessary.
2016-05-05 15:38:16 -07:00
Benoit Steiner
f81e413180
Added a benchmark to measure the performance of full reductions of 16 bit floats
2016-05-05 14:15:11 -07:00
Benoit Steiner
28d5572658
Fixed some incorrect assertions
2016-05-05 10:02:26 -07:00
Benoit Steiner
2aba40d208
Avoid unecessary type promotion
2016-05-05 09:26:57 -07:00
Benoit Steiner
a4d6e8fef0
Strongly hint but don't force the compiler to unroll a some loops in the tensor executor. This results in up to 27% faster code.
2016-05-05 09:25:55 -07:00
Benoit Steiner
7875437ca0
Avoided unecessary type promotion
2016-05-05 09:08:42 -07:00
Benoit Steiner
f363e533aa
Added tests for full contractions using thread pools and gpu devices.
...
Fixed a couple of issues in the corresponding code.
2016-05-05 09:05:45 -07:00
Benoit Steiner
06d774bf58
Updated the contraction code to ensure that full contraction return a tensor of rank 0
2016-05-05 08:37:47 -07:00
Christoph Hertzberg
b300a84989
Fixed some singed/unsigned comparison warnings
2016-05-05 13:36:28 +02:00
Christoph Hertzberg
dacb469bc9
Enable and fix -Wdouble-conversion warnings
2016-05-05 13:35:45 +02:00
Benoit Steiner
62b710072e
Reduced the memory footprint of the cxx11_tensor_image_patch test
2016-05-04 21:08:22 -07:00
Benoit Steiner
dd2b45feed
Removed extraneous 'explicit' keywords
2016-05-04 16:57:52 -07:00
Ola Røer Thorsen
be78aea6b3
fix double-promotion/float-conversion in Core/SpecialFunctions.h
2016-05-04 10:52:08 +02:00
Gael Guennebaud
75a94b9662
Improve documentation of BDCSVD
2016-05-04 12:53:14 +02:00
Benoit Steiner
968ec1c2ae
Use numext::isfinite instead of std::isfinite
2016-05-03 19:56:40 -07:00
Gael Guennebaud
e2ca478485
bug #1214 : consider denormals as zero in D&C SVD. This also workaround infinite binary search when compiling with ICC's unsafe optimizations.
2016-05-03 23:15:29 +02:00
Benoit Steiner
f899e08946
Enabled a number of tests previously disabled by mistake
2016-05-03 14:07:47 -07:00
Benoit Steiner
4c05fb03a3
Merged eigen/eigen into default
2016-05-03 13:15:00 -07:00
Benoit Steiner
577a07a86e
Re-enabled the product_small test now that everything compiles correctly.
2016-05-03 13:11:38 -07:00
Benoit Steiner
2c5568a757
Added a test to validate the computation of exp and log on 16bit floats
2016-05-03 12:06:07 -07:00
Benoit Steiner
6c3e5b85bc
Fixed compilation error with cuda >= 7.5
2016-05-03 09:38:42 -07:00
Benoit Steiner
aad9a04da4
Deleted superfluous explicit keyword.
2016-05-03 09:37:19 -07:00
Benoit Steiner
da50419df8
Made a cast explicit
2016-05-02 19:50:22 -07:00
Benoit Steiner
73ef5371e4
Pulled latest updates from trunk
2016-05-01 14:48:57 -07:00
Benoit Steiner
8a9228ed9b
Fixed compilation error
2016-05-01 14:48:01 -07:00
Gael Guennebaud
b1bd53aa6b
Fix performance regression: with AVX, unaligned stores were emitted instead of aligned ones for fixed size assignement.
2016-05-01 23:25:06 +02:00
Benoit Steiner
d6c9596fd8
Added missing accessors to fixed sized tensors
2016-04-29 18:51:33 -07:00
Benoit Steiner
17fe7f354e
Deleted trailing commas
2016-04-29 18:39:01 -07:00
Benoit Steiner
e5f71aa6b2
Deleted useless trailing commas
2016-04-29 18:36:10 -07:00
Benoit Steiner
44f592dceb
Deleted unnecessary trailing commas.
2016-04-29 18:33:46 -07:00
Benoit Steiner
2b890ae618
Fixed compilation errors generated by clang
2016-04-29 18:30:40 -07:00
Benoit Steiner
d217217842
Added a few tests to ensure that the dimensions of rank 0 tensors are correctly computed
2016-04-29 18:15:34 -07:00
Benoit Steiner
f100d1494c
Return the proper size (ie 1) for tensors of rank 0
2016-04-29 18:14:33 -07:00
Benoit Steiner
d14105f158
Made several tensor tests compatible with cxx03
2016-04-29 17:22:37 -07:00
Benoit Steiner
c0882ef4d9
Moved a number of tensor tests that don't require cxx11 to work properly outside the EIGEN_TEST_CXX11 test section
2016-04-29 17:13:51 -07:00
Benoit Steiner
9d1dbd1ec0
Fixed teh cxx11_tensor_empty test to compile without requiring cxx11 support
2016-04-29 16:53:55 -07:00
Benoit Steiner
a8c0405cf5
Deleted unused default values for template parameters
2016-04-29 16:34:43 -07:00
Benoit Steiner
4f53178e62
Made a coupe of tensor tests compile without requiring c++11 support.
2016-04-29 16:09:54 -07:00
Benoit Steiner
1131a984a6
Made the cxx11_tensor_forced_eval compile without c++11.
2016-04-29 15:48:59 -07:00
Benoit Steiner
46bcb70969
Don't turn on const expressions when compiling with gcc >= 4.8 unless the -std=c++11 option has been used
2016-04-29 15:20:59 -07:00
Benoit Steiner
c07404f6a1
Restore Tensor support for non c++11 compilers
2016-04-29 15:19:19 -07:00
Benoit Steiner
ba32ded021
Fixed include path
2016-04-29 15:11:09 -07:00
Benoit Steiner
3b8da4be5a
Extended the packetmath test to cover all the alignments made possible by avx512 instructions.
2016-04-29 14:13:43 -07:00
Benoit Steiner
2f28ccbea3
Update the makefile to make the tests compile with gcc 4.9
2016-04-29 14:11:09 -07:00
Benoit Steiner
7a4bd337d9
Resolved merge conflict
2016-04-29 13:42:22 -07:00
Benoit Steiner
07a247dcf4
Pulled latest updates from upstream
2016-04-29 13:41:26 -07:00
Benoit Steiner
fa5a8f055a
Implemented palign_impl for AVX512
2016-04-29 13:30:13 -07:00
Benoit Steiner
ef3ac9d05a
Fixed the AVX512 packet traits
2016-04-29 13:28:36 -07:00
Benoit Steiner
d7b75e8d86
Added pdiv packet primitives for avx512
2016-04-29 13:26:47 -07:00
Benoit Steiner
5e89ded685
Implemented preduxp for AVX512
2016-04-29 13:00:33 -07:00
Benoit Steiner
5f85662ad8
Implemented the pabs and preverse primitives for avx512.
2016-04-29 12:53:34 -07:00
Benoit Steiner
d37ee89ca8
Disabled some of the AVX512 primitives on compilers that don't support them
2016-04-29 12:50:29 -07:00
Gael Guennebaud
0f3c4c8ff4
Fix compilation of sparse.cast<>().transpose().
2016-04-29 18:26:08 +02:00
Benoit Steiner
a524a26fdc
Fixed a few memory leaks
2016-04-28 18:55:53 -07:00
Benoit Steiner
dacb23277e
Fixed the igamma and igammac implementations to make them callable from a gpu kernel.
2016-04-28 18:54:54 -07:00
Benoit Steiner
a5d4545083
Deleted unused variable
2016-04-28 14:14:48 -07:00
Justin Lebar
40d1e2f8c7
Eliminate mutual recursion in igamma{,c}_impl::Run.
...
Presently, igammac_impl::Run calls igamma_impl::Run, which in turn calls
igammac_impl::Run.
This isn't actually mutual recursion; the calls are guarded such that we never
get into a loop. Nonetheless, it's a stretch for clang to prove this. As a
result, clang emits a recursive call in both igammac_impl::Run and
igamma_impl::Run.
That this is suboptimal code is bad enough, but it's particularly bad when
compiling for CUDA/nvptx. nvptx allows recursion, but only begrudgingly: If
you have recursive calls in a kernel, it's on you to manually specify the
kernel's stack size. Otherwise, ptxas will dump a warning, make a guess, and
who knows if it's right.
This change explicitly eliminates the mutual recursion in igammac_impl::Run and
igamma_impl::Run.
2016-04-28 13:57:08 -07:00
Konstantinos Margaritis
87294c84a6
define Packet2d constants with VSX only
2016-04-28 14:39:56 -03:00
Konstantinos Margaritis
6ed7a7281c
remove accidentally pasted code
2016-04-28 14:35:55 -03:00
Konstantinos Margaritis
62f9093b31
improve state of MathFunctions as well
2016-04-28 14:33:09 -03:00
Konstantinos Margaritis
8ed26120c8
bring Altivec/VSX to a better state, implement some of the missing functions
2016-04-28 14:32:42 -03:00
Konstantinos Margaritis
950158f6d1
add name to copyrights
2016-04-28 14:32:11 -03:00
Konstantinos Margaritis
ee0459300b
minor fix, add to copyright
2016-04-28 14:31:21 -03:00
Benoit Steiner
3ec81fc00f
Fixed compilation error with clang.
2016-04-27 19:32:12 -07:00
Benoit Steiner
2b917291d9
Merged in rmlarsen/eigen2 (pull request PR-183)
...
Detect cxx_constexpr support when compiling with clang.
2016-04-27 15:19:54 -07:00
Rasmus Munk Larsen
09b9e951e3
Depend on the more extensive support for constexpr in clang:
...
http://clang.llvm.org/docs/LanguageExtensions.html#c-1y-relaxed-constexpr
2016-04-27 14:59:11 -07:00
Rasmus Munk Larsen
1a325ef71c
Detect cxx_constexpr support when compiling with clang.
2016-04-27 14:33:51 -07:00
Benoit Steiner
1a97fd8b4e
Merged latest update from trunk
2016-04-27 14:22:45 -07:00
Benoit Steiner
c61170e87d
fpclassify isn't portable enough. In particular, the return values of the function are not available on all the platforms Eigen supportes: remove it from Eigen.
2016-04-27 14:22:20 -07:00
Gael Guennebaud
318e65e0ae
Fix missing inclusion of Eigen/Core
2016-04-27 23:05:40 +02:00
Benoit Steiner
f629fe95c8
Made the index type a template parameter to evaluateProductBlockingSizes
...
Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes.
2016-04-27 13:11:19 -07:00
Benoit Steiner
66b215b742
Merged latest updates from trunk
2016-04-27 12:57:48 -07:00
Benoit Steiner
25141b69d4
Improved support for min and max on 16 bit floats when running on recent cuda gpus
2016-04-27 12:57:21 -07:00
Rasmus Larsen
ff33798acd
Merged eigen/eigen into default
2016-04-27 12:27:00 -07:00
Rasmus Munk Larsen
463738ccbe
Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases.
2016-04-27 12:26:18 -07:00
Benoit Steiner
6744d776ba
Added support for fpclassify in Eigen::Numext
2016-04-27 12:10:25 -07:00
Rasmus Munk Larsen
1f48f47ab7
Implement stricter argument checking for SYRK and SY2K and real matrices. To implement the BLAS API they should return info=2 if op='C' is passed for a complex matrix. Without this change, the Eigen BLAS fails the strict zblat3 and cblat3 tests in LAPACK 3.5.
2016-04-27 19:59:44 +02:00
Gael Guennebaud
3dddd34133
Refactor the unsupported CXX11/Core module to internal headers only.
2016-04-26 11:20:25 +02:00
Benoit Steiner
4a164d2c46
Fixed the partial evaluation of non vectorizable tensor subexpressions
2016-04-25 10:43:03 -07:00
Benoit Steiner
fd9401f260
Refined the cost of the striding operation.
2016-04-25 09:16:08 -07:00
Heiko Bauke
e19b58e672
alias template for matrix and array classes
2016-04-23 00:08:51 +02:00
Konstantinos Margaritis
3f80696ae1
Merged eigen/eigen into default
2016-04-22 15:05:21 +03:00
Benoit Steiner
5c372d19e3
Merged in rmlarsen/eigen (pull request PR-179)
...
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 18:06:36 -07:00
Benoit Steiner
4bbc97be5e
Provide access to the base threadpool classes
2016-04-21 17:59:33 -07:00
Rasmus Munk Larsen
a3256d78d8
Prevent crash in CompleteOrthogonalDecomposition if object was default constructed.
2016-04-21 16:49:28 -07:00
Benoit Steiner
33adce5c3a
Added the ability to switch to the new thread pool with a #define
2016-04-21 11:59:58 -07:00
Benoit Steiner
79b900375f
Use index list for the striding benchmarks
2016-04-21 11:58:27 -07:00
Benoit Steiner
f670613e4b
Fixed several compilation warnings
2016-04-21 11:03:02 -07:00
Benoit Steiner
6015422ee6
Added an option to enable the use of the F16C instruction set
2016-04-21 10:30:29 -07:00
Benoit Steiner
32ffce04fc
Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable.
2016-04-21 08:47:28 -07:00
Konstantinos Margaritis
e5b2ef47d5
Merged eigen/eigen into default
2016-04-21 18:03:08 +03:00
Benoit Steiner
2dde1b1028
Don't crash when attempting to reduce empty tensors.
2016-04-20 18:08:20 -07:00
Benoit Steiner
a792cd357d
Added more tests
2016-04-20 17:33:58 -07:00
Benoit Steiner
80200a1828
Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction.
2016-04-20 12:10:27 -07:00
Benoit Steiner
c7c2054bb5
Started to implement a portable way to yield.
2016-04-19 17:59:58 -07:00
Benoit Steiner
1d0238375d
Made sure all the required header files are included when trying to use fp16
2016-04-19 17:44:12 -07:00
Benoit Steiner
2b72163028
Implemented a more portable version of thread local variables
2016-04-19 15:56:02 -07:00
Benoit Steiner
04f954956d
Fixed a few typos
2016-04-19 15:27:09 -07:00
Benoit Steiner
5b1106c56b
Fixed a compilation error with nvcc 7.
2016-04-19 14:57:57 -07:00
Benoit Steiner
7129d998db
Simplified the code that launches cuda kernels.
2016-04-19 14:55:21 -07:00
Benoit Steiner
b9ea40c30d
Don't take the address of a kernel on CUDA devices that don't support this feature.
2016-04-19 14:35:11 -07:00
Benoit Steiner
884c075058
Use numext::ceil instead of std::ceil
2016-04-19 14:33:30 -07:00
Benoit Steiner
a278414d1b
Avoid an unnecessary copy of the evaluator.
2016-04-19 13:54:28 -07:00
Benoit Steiner
f953c60705
Fixed 2 recent regression tests
2016-04-19 12:57:39 -07:00
Benoit Steiner
50968a0a3e
Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors.
2016-04-19 11:53:58 -07:00
Benoit Steiner
84543c8be2
Worked around the lack of a rand_r function on windows systems
2016-04-17 19:29:27 -07:00
Benoit Steiner
5fbcfe5eb4
Worked around the lack of a rand_r function on windows systems
2016-04-17 18:42:31 -07:00
Gael Guennebaud
e4fe611e2c
Enable lazy-coeff-based-product for vector*(1x1) products
2016-04-16 15:17:39 +02:00
Benoit Steiner
c8e8f93d6c
Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators.
2016-04-15 16:48:10 -07:00
Benoit Steiner
1a16fb1532
Deleted extraneous comma.
2016-04-15 15:50:13 -07:00
Benoit Steiner
7cff898e0a
Deleted unnecessary variable
2016-04-15 15:46:14 -07:00
Benoit Steiner
6c43c49e4a
Fixed a few compilation warnings
2016-04-15 15:34:34 -07:00
Benoit Steiner
eb669f989f
Merged in rmlarsen/eigen (pull request PR-178)
...
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions.
2016-04-15 14:53:15 -07:00
Gael Guennebaud
2a7115daca
bug #1203 : by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small
2016-04-15 22:34:11 +02:00
Rasmus Munk Larsen
3718bf654b
Get rid of void* casting when calling EvalRange::run.
2016-04-15 12:51:33 -07:00
Benoit Steiner
40c9923a8a
Fixed compilation errors with msvc
2016-04-15 11:27:52 -07:00
Benoit Steiner
1d23430628
Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs).
2016-04-15 10:53:31 -07:00
Gael Guennebaud
1e80bddde3
Fix trmv for mixing types.
2016-04-15 17:58:36 +02:00
Konstantinos Margaritis
0e8fc31087
remove pgather/pscatter for std::complex<double> for s390x
2016-04-15 07:08:57 -04:00
Benoit Steiner
a62e924656
Added ability to access the cache sizes from the tensor devices
2016-04-14 21:25:06 -07:00
Benoit Steiner
18e6f67426
Added support for exclusive or
2016-04-14 20:37:46 -07:00
Rasmus Munk Larsen
07ac4f7e02
Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default.
2016-04-14 18:28:23 -07:00
Benoit Steiner
9624a1ea3d
Added missing definition of PacketSize in the gpu evaluator of convolution
2016-04-14 17:16:58 -07:00
Benoit Steiner
6fbedf5a4e
Merged in rmlarsen/eigen (pull request PR-177)
...
Eigen Tensor cost model part 1.
2016-04-14 17:13:19 -07:00
Benoit Steiner
bebb89acfa
Enabled the new threadpool tests
2016-04-14 16:44:10 -07:00
Benoit Steiner
9c064b5a97
Cleanup
2016-04-14 16:41:31 -07:00
Benoit Steiner
1372156c41
Prepared the migration to the new non blocking thread pool
2016-04-14 16:16:42 -07:00
Rasmus Munk Larsen
aeb5494a0b
Improvements to cost model.
2016-04-14 15:52:58 -07:00
Benoit Steiner
00dfe18487
Merged latest updates from trunk
2016-04-14 15:25:20 -07:00
Benoit Steiner
a8e8837ba7
Added tests for the non blocking thread pool
2016-04-14 15:23:49 -07:00
Benoit Steiner
78a51abc12
Added a more scalable non blocking thread pool
2016-04-14 15:23:10 -07:00
Rasmus Munk Larsen
d2e95492e7
Merge upstream updates.
2016-04-14 13:59:50 -07:00
Rasmus Munk Larsen
235e83aba6
Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions.
2016-04-14 13:57:35 -07:00
Gael Guennebaud
68897c52f3
Add extreme values to the imaginary part for SVD unit tests.
2016-04-14 22:47:30 +02:00
Gael Guennebaud
20f387fafa
Improve numerical robustness of JacoviSVD:
...
- avoid noise amplification in complex to real conversion
- compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix.
2016-04-14 22:46:55 +02:00
Benoit Steiner
7718749fee
Force the inlining of the << operator on half floats
2016-04-14 11:51:54 -07:00
Benoit Steiner
5379d2b594
Inline the << operator on half floats
2016-04-14 11:40:48 -07:00
Benoit Steiner
5912ad877c
Silenced a compilation warning
2016-04-14 11:40:14 -07:00
Benoit Steiner
2b6e3de02f
Added tests to validate flooring and ceiling of fp16
2016-04-14 11:39:18 -07:00
Benoit Steiner
6f23e945f6
Added simple test for numext::sqrt and numext::pow on fp16
2016-04-14 10:32:52 -07:00
Benoit Steiner
72510c80e1
Added basic test for trigonometric functions on fp16
2016-04-14 10:27:24 -07:00
Benoit Steiner
7b3d7acebe
Added support for fp16 to test_isApprox, test_isMuchSmallerThan, and test_isApproxOrLessThan
2016-04-14 10:25:50 -07:00
Benoit Steiner
5c13765ee3
Added ability to printf fp16
2016-04-14 10:24:52 -07:00
Benoit Steiner
c7167fee0e
Added support for fp16 to the sigmoid function
2016-04-14 10:08:33 -07:00
Benoit Steiner
f6003f0873
Made the test msvc friendly
2016-04-14 09:47:26 -07:00
Gael Guennebaud
3551dea887
Cleaning pass on rcond estimator.
2016-04-14 16:45:41 +02:00
Gael Guennebaud
d8a3bdaa24
remove useless include
2016-04-14 15:18:56 +02:00
Gael Guennebaud
d402adc3d7
Better use .data() than &coeffRef(0)
2016-04-14 15:18:08 +02:00
Gael Guennebaud
ea7087ef31
Merged in rmlarsen/eigen (pull request PR-174)
...
Add matrix condition number estimation module.
2016-04-14 15:11:33 +02:00
Benoit Steiner
36f5a10198
Properly gate the definition of the error and gamma functions for fp16
2016-04-13 18:44:48 -07:00
Benoit Steiner
10b69810d1
Improved support for trigonometric functions on GPU
2016-04-13 16:00:51 -07:00
Benoit Steiner
d6105b53b8
Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16
2016-04-13 15:26:02 -07:00
Gael Guennebaud
703251f10f
merge
2016-04-13 23:45:10 +02:00
Gael Guennebaud
39211ba46b
Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block.
2016-04-13 23:43:26 +02:00
Benoit Steiner
2986253259
Cleaned up the implementation of digamma
2016-04-13 14:24:06 -07:00
Benoit Steiner
d5de1a8220
Pulled latest updates from trunk
2016-04-13 14:17:11 -07:00
Benoit Steiner
87ca15c4e8
Added support for sin, cos, tan, and tanh on fp16
2016-04-13 14:12:38 -07:00
Gael Guennebaud
2c9e4fa417
Add debug output for random unit test
2016-04-13 22:56:12 +02:00
Gael Guennebaud
7d1391d049
Turn a converge check to a warning
2016-04-13 22:50:54 +02:00
Gael Guennebaud
feef39e2d1
Fix underflow in JacoviSVD's complex to real preconditioner
2016-04-13 22:49:51 +02:00
Gael Guennebaud
f4e12272f1
Fix corner case in unit test.
2016-04-13 22:18:02 +02:00
Gael Guennebaud
a95e1a273e
Fix warning in unit tests
2016-04-13 22:00:38 +02:00
Benoit Steiner
bf3f6688f0
Added support for computing cos, sin, tan, and tanh on GPU.
2016-04-13 11:55:08 -07:00
Benoit Steiner
473c8380ea
Added constructors to convert unsigned integers into fp16
2016-04-13 11:03:37 -07:00
Gael Guennebaud
42a3352a3b
Workaround a division by zero when outerstride==0
2016-04-13 19:02:02 +02:00
Gael Guennebaud
6f960b83ff
Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same.
2016-04-13 18:47:12 +02:00
Gael Guennebaud
b7716c0328
Fix incomplete previous patch on matrix comparision.
2016-04-13 18:32:56 +02:00
Gael Guennebaud
2630d97c62
Fix detection of same matrices when both matrices are not handled by extract_data.
2016-04-13 18:26:08 +02:00
Gael Guennebaud
512ba0ac76
Add regression unit tests for half-packet vectorization
2016-04-13 18:16:35 +02:00
Gael Guennebaud
06447e0a39
Improve half-packet vectorization logic to distinguish linear versus inner traversal modes.
2016-04-13 18:15:49 +02:00
Gael Guennebaud
bbb8854bf7
Enable half-packet in reduxions.
2016-04-13 13:02:34 +02:00
Benoit Steiner
e9b12cc1f7
Fixed compilation warnings generated by clang
2016-04-12 20:53:18 -07:00
Benoit Steiner
eaeb6ca93a
Enable the benchmarks for algebraic and transcendental fnctions on fp16.
2016-04-12 16:29:00 -07:00
Benoit Steiner
aa1ba8bbd2
Don't put a command at the end of an enumerator list
2016-04-12 16:28:11 -07:00
Benoit Steiner
e49945ced4
Pulled latest update from trunk
2016-04-12 14:13:41 -07:00
Benoit Steiner
25d05c4b8f
Fixed the vectorization logic test
2016-04-12 14:13:25 -07:00
Benoit Steiner
53121c0119
Turned on the contraction benchmarks for fp16
2016-04-12 14:11:52 -07:00
Gael Guennebaud
b67c983291
Enable the use of half-packet in coeff-based product.
...
For instance, Matrix4f*Vector4f is now vectorized again when using AVX.
2016-04-12 23:03:03 +02:00
Benoit Steiner
e3a184785c
Fixed the zeta test
2016-04-12 11:12:36 -07:00
Benoit Steiner
3b76df64fc
Defer the decision to vectorize tensor CUDA code to the meta kernel. This makes it possible to decide to vectorize or not depending on the capability of the target cuda architecture. In particular, this enables us to vectorize the processing of fp16 when running on device of capability >= 5.3
2016-04-12 10:58:51 -07:00
Benoit Steiner
8bfe739cd2
Updated the AVX512 PacketMath to properly leverage the AVX512DQ instructions
2016-04-11 18:40:16 -07:00
Rasmus Larsen
6498dadc2f
Merged eigen/eigen into default
2016-04-11 17:42:05 -07:00
Benoit Steiner
d6e596174d
Pull latest updates from upstream
2016-04-11 17:20:17 -07:00
Benoit Steiner
748c4c4599
More accurate cost estimates for exp, log, tanh, and sqrt.
2016-04-11 13:11:04 -07:00
Benoit Steiner
833efb39bf
Added epsilon, dummy_precision, infinity and quiet_NaN NumTraits for fp16
2016-04-11 11:03:56 -07:00
Benoit Steiner
e939b087fe
Pulled latest update from trunk
2016-04-11 11:03:02 -07:00
Gael Guennebaud
1744b5b5d2
Update doc regarding the genericity of EIGEN_USE_BLAS
2016-04-11 17:16:07 +02:00
Gael Guennebaud
91bf925fc1
Improve constness of level2 blas API.
2016-04-11 17:13:01 +02:00
Gael Guennebaud
0483430283
Move LAPACK declarations from blas.h to lapack.h and fix compatibility with EIGEN_USE_MKL
2016-04-11 17:12:31 +02:00
Gael Guennebaud
097d1e8823
Cleanup obsolete assign_scalar_eig2mkl helper.
2016-04-11 16:09:29 +02:00
Gael Guennebaud
fec4c334ba
Remove all references to MKL in BLAS wrappers.
2016-04-11 16:04:09 +02:00
Gael Guennebaud
ddabc992fa
Fix long to int conversion in BLAS API.
2016-04-11 15:52:01 +02:00
Gael Guennebaud
8191f373be
Silent unused warning.
2016-04-11 15:37:16 +02:00
Gael Guennebaud
6a9ca88e7e
Relax dependency on MKL for EIGEN_USE_BLAS
2016-04-11 15:17:14 +02:00
Gael Guennebaud
4e8e5888d7
Improve constness of blas level-3 interface.
2016-04-11 15:12:44 +02:00
Gael Guennebaud
675e0a2224
Fix static/inline keywords order.
2016-04-11 15:06:20 +02:00
Gael Guennebaud
fc6a0ebb1c
Typos in doc.
2016-04-11 10:54:58 +02:00
Till Hoffmann
643b697649
Proper handling of domain errors.
2016-04-10 00:37:53 +01:00
Rasmus Munk Larsen
1f70bd4134
Merge.
2016-04-09 15:31:53 -07:00
Rasmus Munk Larsen
096e355f8e
Add short-circuit to avoid calling matrix norm for empty matrix.
2016-04-09 15:29:56 -07:00
Rasmus Larsen
be80fb49fc
Merged default ( 4a92b590a0
...
) into default
2016-04-09 13:13:01 -07:00
Rasmus Larsen
7a8176587b
Merged eigen/eigen into default
2016-04-09 12:47:41 -07:00
Rasmus Munk Larsen
4a92b590a0
Merge.
2016-04-09 12:47:24 -07:00
Rasmus Munk Larsen
ee6c69733a
A few tiny adjustments to short-circuit logic.
2016-04-09 12:45:49 -07:00
Till Hoffmann
7f4826890c
Merge upstream
2016-04-09 20:08:07 +01:00
Till Hoffmann
de057ebe54
Added nans to zeta function.
2016-04-09 20:07:36 +01:00
Gael Guennebaud
af2161cdb4
bug #1197 : fix/relax some LM unit tests
2016-04-09 11:14:02 +02:00
Gael Guennebaud
a05a683d83
bug #1160 : fix and relax some lm unit tests by turning faillures to warnings
2016-04-09 10:49:19 +02:00
Benoit Steiner
5da90fc8dd
Use numext::abs instead of std::abs in scalar_fuzzy_default_impl to make it usable inside GPU kernels.
2016-04-08 19:40:48 -07:00
Benoit Steiner
01bd577288
Fixed the implementation of Eigen::numext::isfinite, Eigen::numext::isnan, andEigen::numext::isinf on CUDA devices
2016-04-08 16:40:10 -07:00
Benoit Steiner
89a3dc35a3
Fixed isfinite_impl: NumTraits<T>::highest() and NumTraits<T>::lowest() are finite numbers.
2016-04-08 15:56:16 -07:00
Benoit Steiner
995f202cea
Disabled the use of half2 on cuda devices of compute capability < 5.3
2016-04-08 14:43:36 -07:00
Benoit Steiner
8d22967bd9
Initial support for taking the power of fp16
2016-04-08 14:22:39 -07:00
Benoit Steiner
3394379319
Fixed the packet_traits for half floats.
2016-04-08 13:33:59 -07:00
Benoit Steiner
0d2a532fc3
Created the new EIGEN_TEST_CUDA_CLANG option to compile the CUDA tests using clang instead of nvcc
2016-04-08 13:16:08 -07:00
Rasmus Larsen
0b81a18d12
Merged eigen/eigen into default
2016-04-08 12:58:57 -07:00
Benoit Steiner
2d072b38c1
Don't test the division by 0 on float16 when compiling with msvc since msvc detects and errors out on divisions by 0.
2016-04-08 12:50:25 -07:00
Benoit Jacob
cd2b667ac8
Add references to filed LLVM bugs
2016-04-08 08:12:47 -04:00
Benoit Steiner
3bd16457e1
Properly handle complex numbers.
2016-04-07 23:28:04 -07:00
Benoit Steiner
63102ee43d
Turn on the coeffWise benchmarks on fp16
2016-04-07 23:05:20 -07:00
Benoit Steiner
7c47d3e663
Fixed the type casting benchmarks for fp16
2016-04-07 22:50:25 -07:00
Benoit Steiner
166b56bc61
Fixed the type casting benchmark for float16
2016-04-07 22:45:54 -07:00
Benoit Steiner
2f2801f096
Merged in parthaEth/eigen (pull request PR-175)
...
Static casting scalar types so as to let chlesky module of eigen work with ceres
2016-04-07 22:10:14 -07:00
Benoit Steiner
d962fe6a99
Renamed float16 into cxx11_float16 since the test relies on c++11 features
2016-04-07 20:28:32 -07:00
Rasmus Larsen
c34e55c62b
Merged eigen/eigen into default
2016-04-07 20:23:03 -07:00
Benoit Steiner
7d5b17087f
Added missing EIGEN_DEVICE_FUNC to the tensor conversion code.
2016-04-07 20:01:19 -07:00
Benoit Steiner
a6d08be9b2
Fixed the benchmarking of fp16 coefficient wise operations
2016-04-07 17:13:44 -07:00
Rasmus Munk Larsen
283c51cd5e
Widen short-circuiting ReciprocalConditionNumberEstimate so we don't call InverseMatrixL1NormEstimate for dec.rows() <= 1.
2016-04-07 16:45:40 -07:00
Rasmus Munk Larsen
d51803a728
Use Index instead of int for indexing and sizes.
2016-04-07 16:39:48 -07:00
Rasmus Munk Larsen
fd872aefb3
Remove transpose() method from LLT and LDLT classes as it would imply conjugation.
...
Explicitly cast constants to RealScalar in ConditionEstimator.h.
2016-04-07 16:28:44 -07:00
Rasmus Munk Larsen
0b5546d182
Use lpNorm<1>() to compute l1 norms in LLT and LDLT.
2016-04-07 15:49:30 -07:00
parthaEth
2d5bb375b7
Static casting scalar types so as to let chlesky module of eigen work with ceres
2016-04-08 00:14:44 +02:00
Benoit Steiner
a02ec09511
Worked around numerical noise in the test for the zeta function.
2016-04-07 12:11:02 -07:00
Benoit Steiner
c912b1d28c
Fixed a typo in the polygamma test.
2016-04-07 11:51:07 -07:00
Benoit Steiner
74f64838c5
Updated the unary functors to use the numext implementation of typicall functions instead of the one provided in the standard library. The standard library functions aren't supported officially by cuda, so we're better off using the numext implementations.
2016-04-07 11:42:14 -07:00
Benoit Steiner
737644366f
Move the functions operating on fp16 out of the std namespace and into the Eigen::numext namespace
2016-04-07 11:40:15 -07:00
Benoit Steiner
dc45aaeb93
Added tests for float16
2016-04-07 11:18:05 -07:00
Benoit Steiner
8db269e055
Fixed a typo in a test
2016-04-07 10:41:51 -07:00
Benoit Steiner
b89d3f78b2
Updated the isnan, isinf and isfinite functions to make compatible with cuda devices.
2016-04-07 10:08:49 -07:00
Benoit Steiner
48308ed801
Added support for isinf, isnan, and isfinite checks to the tensor api
2016-04-07 09:48:36 -07:00
Benoit Steiner
cfb34d808b
Fixed a possible integer overflow.
2016-04-07 08:46:52 -07:00
Benoit Steiner
df838736e2
Fixed compilation warning triggered by msvc
2016-04-06 20:48:55 -07:00
Benoit Steiner
14ea7c7ec7
Fixed packet_traits<half>
2016-04-06 19:30:21 -07:00
Benoit Steiner
532fdf24cb
Added support for hardware conversion between fp16 and full floats whenever
...
possible.
2016-04-06 17:11:31 -07:00
Benoit Steiner
165150e896
Fixed the tests for the zeta and polygamma functions
2016-04-06 14:31:01 -07:00
Benoit Steiner
7be1eaad1e
Fixed typos in the implementation of the zeta and polygamma ops.
2016-04-06 14:15:37 -07:00
Benoit Steiner
58c1dbff19
Made the fp16 code more portable.
2016-04-06 13:44:08 -07:00
Benoit Steiner
cf7e73addd
Added some missing conversions to the Half class, and fixed the implementation of the < operator on cuda devices.
2016-04-06 09:59:51 -07:00
Benoit Steiner
10bdd8e378
Merged in tillahoffmann/eigen (pull request PR-173)
...
Added zeta function of two arguments and polygamma function
2016-04-06 09:40:17 -07:00
Benoit Steiner
7781f865cb
Renamed the EIGEN_TEST_NVCC cmake option into EIGEN_TEST_CUDA per the discussion in bug #1173 .
2016-04-06 09:35:23 -07:00
Benoit Steiner
72abfa11dd
Added support for isfinite on fp16
2016-04-06 09:07:30 -07:00
Rasmus Munk Larsen
4d07064a3d
Fix bug in alternate lower bound calculation due to missing parentheses.
...
Make a few expressions more concise.
2016-04-05 16:40:48 -07:00
Konstantinos Margaritis
2bba4ee2cf
Merged kmargar/eigen/tip into default
2016-04-05 22:22:08 +03:00
Konstantinos Margaritis
317384b397
complete the port, remove float support
2016-04-05 14:56:45 -04:00
tillahoffmann
726bd5f077
Merged eigen/eigen into default
2016-04-05 18:21:05 +01:00
Till Hoffmann
a350c25a39
Added accuracy comments.
2016-04-05 18:20:40 +01:00
Gael Guennebaud
4d7e230d2f
bug #1189 : fix pow/atan2 compilation for AutoDiffScalar
2016-04-05 14:49:41 +02:00
Konstantinos Margaritis
bc0ad363c6
add remaining includes
2016-04-05 06:01:17 -04:00
Konstantinos Margaritis
2d41dc9622
complete int/double specialized traits for ZVector
2016-04-05 06:00:51 -04:00
Konstantinos Margaritis
644d0f91d2
enable all tests again
2016-04-05 05:59:54 -04:00
Konstantinos Margaritis
988344daf1
enable the other includes as well
2016-04-05 05:59:30 -04:00
Rasmus Larsen
d7eeee0c1d
Merged eigen/eigen into default
2016-04-04 15:58:27 -07:00
Rasmus Munk Larsen
513c372960
Fix docstrings to list all supported decompositions.
2016-04-04 14:34:59 -07:00
Rasmus Munk Larsen
86e0ed81f8
Addresses comments on Eigen pull request PR-174.
...
* Get rid of code-duplication for real vs. complex matrices.
* Fix flipped arguments to select.
* Make the condition estimation functions free functions.
* Use Vector::Unit() to generate canonical unit vectors.
* Misc. cleanup.
2016-04-04 14:20:01 -07:00
Benoit Jacob
158fea0f5e
bug #1190 - Don't trust __ARM_FEATURE_FMA on Clang/ARM
2016-04-04 16:42:40 -04:00
Benoit Jacob
03f2997a11
bug #1191 - Prevent Clang/ARM from rewriting VMLA into VMUL+VADD
2016-04-04 16:41:47 -04:00
Till Hoffmann
b0143de177
Merge upstream.
2016-04-04 19:16:48 +01:00
Till Hoffmann
b97911dd18
Refactored code into type-specific helper functions.
2016-04-04 19:16:03 +01:00
Benoit Steiner
c4179dd470
Updated the scalar_abs_op struct to make it compatible with cuda devices.
2016-04-04 11:11:51 -07:00
Benoit Steiner
1108b4f218
Fixed the signature of numext::abs to make it compatible with complex numbers
2016-04-04 11:09:25 -07:00
tillahoffmann
b8245cc325
Merged eigen/eigen into default
2016-04-04 12:28:11 +01:00
Gael Guennebaud
2b457f8e5e
Fix cross-compiling windows version detection
2016-04-04 11:47:46 +02:00
Rasmus Larsen
30242b7565
Merged eigen/eigen into default
2016-04-01 17:19:36 -07:00
Rasmus Munk Larsen
9d51f7c457
Add rcond method to LDLT.
2016-04-01 16:48:38 -07:00
Rasmus Munk Larsen
f54137606e
Add condition estimation to Cholesky (LLT) factorization.
2016-04-01 16:19:45 -07:00
Rasmus Munk Larsen
fb8dccc23e
Replace "inline static" with "static inline" for consistency.
2016-04-01 12:48:18 -07:00
Rasmus Munk Larsen
91414e0042
Fix comments in ConditionEstimator and minor cleanup.
2016-04-01 11:58:17 -07:00
Rasmus Munk Larsen
1aa89fb855
Add matrix condition estimator module that implements the Higham/Hager algorithm from http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf used in LPACK. Add rcond() methods to FullPivLU and PartialPivLU.
2016-04-01 10:27:59 -07:00
Till Hoffmann
80eba21ad0
Merge upstream.
2016-04-01 18:18:49 +01:00
Till Hoffmann
eb0ae602bd
Added CUDA tests.
2016-04-01 18:17:45 +01:00
Till Hoffmann
ffd770ce94
Fixed CUDA signature.
2016-04-01 17:58:24 +01:00
Till Hoffmann
3cb0a237c1
Fixed suggestions by Eugene Brevdo.
2016-04-01 17:51:39 +01:00
tillahoffmann
49960adbdd
Merged eigen/eigen into default
2016-04-01 14:36:15 +01:00
Till Hoffmann
57239f4a81
Added polygamma function.
2016-04-01 14:35:21 +01:00
Till Hoffmann
dd5d390daf
Added zeta function.
2016-04-01 13:32:29 +01:00
Benoit Steiner
3da495e6b9
Relaxed the condition used to gate the fft code.
2016-03-31 18:11:51 -07:00
Benoit Steiner
0ea7ab4f62
Hashing was only officially introduced in c++11. Therefore only define an implementation of the hash function for float16 if c++11 is enabled.
2016-03-31 14:44:55 -07:00
Benoit Steiner
92b7f7b650
Improved code formating
2016-03-31 13:09:58 -07:00
Benoit Steiner
f197813f37
Added the ability to hash a fp16
2016-03-31 13:09:23 -07:00
Benoit Steiner
0f5cc504fe
Properly gate the fft code
2016-03-31 12:59:39 -07:00
Benoit Steiner
4c859181da
Made it possible to use the NumTraits for complex and Array in a cuda kernel.
2016-03-31 12:48:38 -07:00
Benoit Steiner
c36ab19902
Added __ldg primitive for fp16.
2016-03-31 10:55:03 -07:00
Benoit Steiner
b575fb1d02
Added NumTraits for half floats
2016-03-31 10:43:59 -07:00
Benoit Steiner
8c8a79cec1
Fixed a typo
2016-03-31 10:33:32 -07:00
Benoit Steiner
af4ef540bf
Fixed a off-by-one bug in a debug assertion
2016-03-30 18:37:19 -07:00
Benoit Steiner
791e5cfb69
Added NumTraits for type2index.
2016-03-30 18:36:36 -07:00
Benoit Steiner
4f1a7e51c1
Pull math functions from the global namespace only when compiling cuda code with nvcc. When compiling with clang, we want to use the std namespace.
2016-03-30 17:59:49 -07:00
Benoit Steiner
bc68fc2fe7
Enable constant expressions when compiling cuda code with clang.
2016-03-30 17:58:32 -07:00
Benoit Steiner
483aaad10a
Fixed compilation warning
2016-03-30 17:08:13 -07:00
Benoit Steiner
1b40abbf99
Added missing assignment operator to the TensorUInt128 class, and made misc small improvements
2016-03-30 13:17:03 -07:00
Benoit Jacob
01b5333e44
bug #1186 - vreinterpretq_u64_f64 fails to build on Android/Aarch64/Clang toolchain
2016-03-30 11:02:33 -04:00
Benoit Steiner
aa45ad2aac
Fixed the formatting of the README.
2016-03-29 15:06:13 -07:00
Benoit Steiner
56df5ef1d7
Attempt to fix the formatting of the README
2016-03-29 15:03:38 -07:00
Benoit Steiner
1bcd82e31b
Pulled latest updates from trunk
2016-03-29 13:36:18 -07:00
Gael Guennebaud
09ad31aa85
Add regression test for nesting type handling in blas_traits
2016-03-29 22:33:57 +02:00
Benoit Steiner
1841d6d4c3
Added missing cuda template specializations for numext::ceil
2016-03-29 13:29:34 -07:00
Benoit Steiner
7b7d2a9fa5
Use false instead of 0 as the expected value of a boolean
2016-03-29 11:50:17 -07:00
Benoit Steiner
e02b784ec3
Added support for standard mathematical functions and trancendentals(such as exp, log, abs, ...) on fp16
2016-03-29 09:20:36 -07:00
Benoit Steiner
c38295f0a0
Added support for fmod
2016-03-28 15:53:02 -07:00
Benoit Steiner
6772f653c3
Made it possible to customize the threadpool
2016-03-28 10:01:04 -07:00
Benoit Steiner
1bc81f7889
Fixed compilation warnings on arm
2016-03-28 09:21:04 -07:00
Benoit Steiner
78f83d6f6a
Prevent potential overflow.
2016-03-28 09:18:04 -07:00
Konstantinos Margaritis
01e7298fe6
actually include ZVector files, passes most basic tests (float still fails)
2016-03-28 10:58:02 -04:00
Konstantinos Margaritis
f48011119e
Merged eigen/eigen into default
2016-03-28 01:48:45 +03:00
Konstantinos Margaritis
ed6b9d08f1
some primitives ported, but missing intrinsics and crash with asm() are a problem
2016-03-27 18:47:49 -04:00
Benoit Steiner
74f91ed06c
Improved support for integer modulo
2016-03-25 17:21:56 -07:00
Benoit Steiner
65716e99a5
Improved the cost estimate of the quotient op
2016-03-25 11:13:53 -07:00
Benoit Steiner
d94f6ba965
Started to model the cost of divisions more accurately.
2016-03-25 11:02:56 -07:00
Benoit Steiner
a86c9f037b
Fixed compilation error on windows
2016-03-24 18:54:31 -07:00
Benoit Steiner
0968e925a0
Updated the benchmarking code to use Eigen::half instead of half
2016-03-24 18:00:33 -07:00
Benoit Steiner
044efea965
Made sure that the cxx11_tensor_cuda test can be compiled even without support for cxx11.
2016-03-23 20:02:11 -07:00
Benoit Steiner
2e4e4cb74d
Use numext::abs instead of abs to avoid incorrect conversion to integer of the argument
2016-03-23 16:57:12 -07:00
Benoit Steiner
41434a8a85
Avoid unnecessary conversions
2016-03-23 16:52:38 -07:00
Benoit Steiner
92693b50eb
Fixed compilation warning
2016-03-23 16:40:36 -07:00
Benoit Steiner
9bc9396e88
Use portable includes
2016-03-23 16:30:06 -07:00
Benoit Steiner
393bc3b16b
Added comment
2016-03-23 16:22:15 -07:00
Benoit Steiner
81d340984a
Removed executable bit from header files
2016-03-23 16:15:02 -07:00
Benoit Steiner
bff8cbad06
Removed executable bit from header files
2016-03-23 16:14:23 -07:00
Benoit Steiner
7a570e50ef
Fixed contractions of fp16
2016-03-23 16:00:06 -07:00
Benoit Steiner
7168afde5e
Made the tensor benchmarks compile on MacOS
2016-03-23 14:21:04 -07:00
Benoit Steiner
2062ee2d26
Added a test to verify that notifications are working properly
2016-03-23 13:39:00 -07:00
Benoit Steiner
fc3660285f
Made type conversion explicit
2016-03-23 09:56:50 -07:00
Benoit Steiner
0e68882604
Added the ability to divide a half float by an index
2016-03-23 09:46:42 -07:00
Benoit Steiner
6971146ca9
Added more conversion operators for half floats
2016-03-23 09:44:52 -07:00
Christoph Hertzberg
9642fd7a93
Replace all M_PI by EIGEN_PI and add a check to the testsuite.
2016-03-23 15:37:45 +01:00
Benoit Steiner
28e02996df
Merged patch 672 from Justin Lebar: Don't use long doubles with cuda
2016-03-22 16:53:57 -07:00
Benoit Steiner
3d1e857327
Fixed compilation error
2016-03-22 15:48:28 -07:00
Benoit Steiner
de7d92c259
Pulled latest updates from trunk
2016-03-22 15:24:49 -07:00
Benoit Steiner
002cf0d1c9
Use a single Barrier instead of a collection of Notifications to reduce the thread synchronization overhead
2016-03-22 15:24:23 -07:00
Benoit Steiner
bc2b802751
Fixed a couple of typos
2016-03-22 14:27:34 -07:00
Benoit Steiner
e7a468c5b7
Filter some compilation flags that nvcc warns about.
2016-03-22 14:26:50 -07:00
Benoit Steiner
6a31b7be3e
Avoid using std::vector whenever possible
2016-03-22 14:02:50 -07:00
Benoit Steiner
65a7113a36
Use an enum instead of a static const int to prevent possible link error
2016-03-22 09:33:54 -07:00
Benoit Steiner
f9ad25e4d8
Fixed contractions of 16 bit floats
2016-03-22 09:30:23 -07:00
Benoit Steiner
8ef3181f15
Worked around a constness related issue
2016-03-21 11:24:05 -07:00
Benoit Steiner
7a07d6aa2b
Small cleanup
2016-03-21 11:12:17 -07:00
Konstantinos Margaritis
a9a6710e15
add initial s390x(zEC13) ZVECTOR support
2016-03-21 13:46:47 -04:00
Benoit Steiner
e91f255301
Marked variables that's only used in debug mode as such
2016-03-21 10:02:00 -07:00
Benoit Steiner
db5c14de42
Explicitly cast the default value into the proper scalar type.
2016-03-21 09:52:58 -07:00
Christoph Hertzberg
b224771f40
bug #1178 : Simplified modification of the SSE control register for better portability
2016-03-20 10:57:08 +01:00
Benoit Steiner
8e03333f06
Renamed some class members to make the code more readable.
2016-03-18 15:21:04 -07:00
Benoit Steiner
6c08943d9f
Fixed a bug in the padding of extracted image patches.
2016-03-18 15:19:10 -07:00
Benoit Steiner
134d750eab
Completed the implementation of vectorized type casting of half floats.
2016-03-18 13:36:28 -07:00
Benoit Steiner
7bd551b3a9
Make all the conversions explicit
2016-03-18 12:20:08 -07:00
Benoit Steiner
bb0e73c191
Gate all the CUDA tests under the EIGEN_TEST_NVCC option
2016-03-18 12:17:37 -07:00
Benoit Steiner
2db4a04827
Fixed a typo
2016-03-18 12:08:01 -07:00
Benoit Steiner
dd514de8a9
Added a test to validate the fallback path for half floats
2016-03-18 12:02:39 -07:00
Benoit Steiner
9a7ece9caf
Worked around constness issue
2016-03-18 10:38:29 -07:00
Benoit Steiner
edc679f6c6
Fixed compilation warning
2016-03-18 07:12:34 -07:00
Benoit Steiner
53d498ef06
Fixed compilation warnings in the cuda tests
2016-03-18 07:04:54 -07:00
Benoit Steiner
e10e126cd0
pulled latest updates from trunk
2016-03-17 21:48:38 -07:00
Benoit Steiner
70eb70f5f8
Avoid mutable class members when possible
2016-03-17 21:47:18 -07:00
Benoit Steiner
7b98de1f15
Implemented some of the missing type casting for half floats
2016-03-17 21:45:45 -07:00
Benoit Steiner
afb81b7ded
Made sure to use the hard abi when compiling with NEON instructions to avoid the "gnu/stubs-soft.h: No such file or directory" error
2016-03-17 21:24:24 -07:00
Benoit Steiner
95b8961a9b
Allocate the mersenne twister used by the random number generators on the heap instead of on the stack since they tend to keep a lot of state (i.e. about 5k) around.
2016-03-17 15:23:51 -07:00
Benoit Steiner
f7329619da
Fix bug in tensor contraction. The code assumes that contraction axis indices for the LHS (after possibly swapping to ColMajor!) is increasing. Explicitly sort the contraction axis pairs to make it so.
2016-03-17 15:08:02 -07:00
Christoph Hertzberg
46aa9772fc
Merged in ebrevdo/eigen (pull request PR-169)
...
Bugfixes to cuda tests, igamma & igammac implemented, & tests for digamma, igamma, igammac on CPU & GPU.
2016-03-16 21:59:08 +01:00
Eugene Brevdo
f1f7181f53
Merge default branch.
2016-03-16 12:46:19 -07:00
Eugene Brevdo
1f69a1b65f
Change the header guard around certain numext functions to be CUDA specific.
2016-03-16 12:44:35 -07:00
Benoit Steiner
ab9b749b45
Improved a test
2016-03-14 20:03:13 -07:00
Benoit Steiner
5a51366ea5
Fixed a typo.
2016-03-14 09:25:16 -07:00
Benoit Steiner
fcf59e1c37
Properly gate the use of cuda intrinsics in the code
2016-03-14 09:13:44 -07:00
Benoit Steiner
97a1f1c273
Make sure we only use the half float intrinsic when compiling with a version of CUDA that is recent enough to provide them
2016-03-14 08:37:58 -07:00
Eugene Brevdo
9550be925d
Merge specfun branch.
2016-03-13 15:46:51 -07:00
Eugene Brevdo
b1a9afe9a9
Add tests in array.cpp that check igamma/igammac properties.
...
This adds to the set of existing tests, which compare a specific
set of values to third party calculated ground truth.
2016-03-13 15:45:34 -07:00
Benoit Steiner
e29c9676b1
Don't mark the cast operator as explicit, since this is a c++11 feature that's not supported by older compilers.
2016-03-12 00:15:58 -08:00
Benoit Steiner
eecd914864
Also replaced uint32_t with unsigned int to make the code more portable
2016-03-11 19:34:21 -08:00
Benoit Steiner
1ca8c1ec97
Replaced a couple more uint16_t with unsigned short
2016-03-11 19:28:28 -08:00
Benoit Steiner
0423b66187
Use unsigned short instead of uint16_t since they're more portable
2016-03-11 17:53:41 -08:00
Benoit Steiner
048c4d6efd
Made half floats usable on hardware that doesn't support them natively.
2016-03-11 17:21:42 -08:00
Benoit Steiner
b72ffcb05e
Made the comparison of Eigen::array GPU friendly
2016-03-11 16:37:59 -08:00
Benoit Steiner
25f69cb932
Added a comparison operator for Eigen::array
...
Alias Eigen::array to std::array when compiling with Visual Studio 2015
2016-03-11 15:20:37 -08:00
Benoit Steiner
c5b98a58b8
Updated the cxx11_meta test to work on the Eigen::array class when std::array isn't available.
2016-03-11 11:53:38 -08:00
Benoit Steiner
456e038a4e
Fixed the +=, -=, *= and /= operators to return a reference
2016-03-10 15:17:44 -08:00
Benoit Steiner
86d45a3c83
Worked around visual studio compilation warnings.
2016-03-09 21:29:39 -08:00
Benoit Steiner
8fd4241377
Fixed a typo.
2016-03-10 02:28:46 +00:00
Benoit Steiner
a685a6beed
Made the list reductions less ambiguous.
2016-03-09 17:41:52 -08:00
Benoit Steiner
3149b5b148
Avoid implicit cast
2016-03-09 17:35:17 -08:00
Benoit Steiner
b2100b83ad
Made sure to include the <random> header file when compiling with visual studio
2016-03-09 16:03:16 -08:00
Benoit Steiner
f05fb449b8
Avoid unnecessary conversion from 32bit int to 64bit unsigned int
2016-03-09 15:27:45 -08:00
Benoit Steiner
1d566417d2
Enable the random number generators when compiling with visual studio
2016-03-09 10:55:11 -08:00
Eugene Brevdo
836e92a051
Update MathFunctions/SpecialFunctions with intelligent header guards.
2016-03-09 09:04:45 -08:00
Benoit Steiner
b084133dbf
Fixed the integer division code on windows
2016-03-09 07:06:36 -08:00
Benoit Steiner
6d30683113
Fixed static assertion
2016-03-08 21:02:51 -08:00
Eugene Brevdo
5e7de771e3
Properly fix merge issues.
2016-03-08 17:35:05 -08:00
Eugene Brevdo
73220d2bb0
Resolve bad merge.
2016-03-08 17:28:21 -08:00
Eugene Brevdo
5f17de3393
Merge changes.
2016-03-08 17:22:26 -08:00
Eugene Brevdo
14f0fde51f
Add certain functions to numext (log, exp, tan) because CUDA doesn't support std::
...
Use these in SpecialFunctions.
2016-03-08 17:17:44 -08:00
Benoit Steiner
46177c8d64
Replace std::vector with our own implementation, as using the stl when compiling with nvcc and avx enabled leads to many issues.
2016-03-08 16:37:27 -08:00
Benoit Steiner
6d6413f768
Simplified the full reduction code
2016-03-08 16:02:00 -08:00
Benoit Steiner
5a427a94a9
Fixed the tensor generator code
2016-03-08 13:28:06 -08:00
Benoit Steiner
a81b88bef7
Fixed the tensor concatenation code
2016-03-08 12:30:19 -08:00
Benoit Steiner
551ff11d0d
Fixed the tensor layout swapping code
2016-03-08 12:28:10 -08:00
Benoit Steiner
8768c063f5
Fixed the tensor chipping code.
2016-03-08 12:26:49 -08:00
Benoit Steiner
e09eb835db
Decoupled the packet type definition from the definition of the tensor ops. All the vectorization is now defined in the tensor evaluators. This will make it possible to relialably support devices with different packet types in the same compilation unit.
2016-03-08 12:07:33 -08:00
Benoit Steiner
3b614a2358
Use NumTraits::highest() and NumTraits::lowest() instead of the std::numeric_limits to make the tensor min and max functors more CUDA friendly.
2016-03-07 17:53:28 -08:00
Eugene Brevdo
dd6dcad6c2
Merge branch specfun.
2016-03-07 15:37:12 -08:00
Eugene Brevdo
0bb5de05a1
Finishing touches on igamma/igammac for GPU. Tests now pass.
2016-03-07 15:35:09 -08:00
Benoit Steiner
769685e74e
Added the ability to pad a tensor using a non-zero value
2016-03-07 14:45:37 -08:00
Benoit Steiner
7f87cc3a3b
Fix a couple of typos in the code.
2016-03-07 14:31:27 -08:00
Eugene Brevdo
5707004d6b
Fix Eigen's building of sharded tests that use CUDA & more igamma/igammac bugfixes.
...
0. Prior to this PR, not a single sharded CUDA test was actually being *run*.
Fixed that.
GPU tests are still failing for igamma/igammac.
1. Add calls for igamma/igammac to TensorBase
2. Fix up CUDA-specific calls of igamma/igammac
3. Add unit tests for digamma, igamma, igammac in CUDA.
2016-03-07 14:08:56 -08:00
Benoit Steiner
e5f25622e2
Added a test to validate the behavior of some of the tensor syntactic sugar.
2016-03-07 09:04:27 -08:00
Benoit Steiner
9f5740cbc1
Added missing include
2016-03-06 22:03:18 -08:00
Benoit Steiner
5238e03fe1
Don't try to compile the uint128 test with compilers that don't support uint127
2016-03-06 21:59:40 -08:00
Benoit Steiner
9a54c3e32b
Don't warn that msvc 2015 isn't c++11 compliant just because it doesn't claim to be.
2016-03-06 09:38:56 -08:00
Benoit Steiner
05bbca079a
Turn on some of the cxx11 features when compiling with visual studio 2015
2016-03-05 10:52:08 -08:00
Benoit Steiner
6093eb9ff5
Don't test our 128bit emulation code when compiling with msvc
2016-03-05 10:37:11 -08:00
Benoit Steiner
57b263c5b9
Avoid using initializer lists in test since not all version of msvc support them
2016-03-05 08:35:26 -08:00
Benoit Steiner
23aed8f2e4
Use EIGEN_PI instead of redefining our own constant PI
2016-03-05 08:04:45 -08:00
Eugene Brevdo
0b9e0abc96
Make igamma and igammac work correctly.
...
This required replacing ::abs with std::abs.
Modified some unit tests.
2016-03-04 21:12:10 -08:00
Benoit Steiner
c23e0be18f
Use the CMAKE_CXX_STANDARD variable to turn on cxx11
2016-03-04 20:18:01 -08:00
Benoit Steiner
ec35068edc
Don't rely on the M_PI constant since not all compilers provide it.
2016-03-04 16:42:38 -08:00
Benoit Steiner
60d9df11c1
Fixed the computation of leading zeros when compiling with msvc.
2016-03-04 16:27:02 -08:00
Benoit Steiner
4e49fd5eb9
MSVC uses __uint128 while other compilers use __uint128_t to encode 128bit unsigned integers. Make the cxx11_tensor_uint128.cpp test work in both cases.
2016-03-04 14:49:18 -08:00
Benoit Steiner
667fcc2b53
Fixed syntax error
2016-03-04 14:37:51 -08:00
Benoit Steiner
4416a5dcff
Added missing include
2016-03-04 14:35:43 -08:00
Benoit Steiner
c561eeb7bf
Don't use implicit type conversions in initializer lists since not all compilers support them.
2016-03-04 14:12:45 -08:00
Benoit Steiner
174edf976b
Made the contraction test more portable
2016-03-04 14:11:13 -08:00
Benoit Steiner
2c50fc878e
Fixed a typo
2016-03-04 14:09:38 -08:00
Eugene Brevdo
7ea35bfa1c
Initial implementation of igamma and igammac.
2016-03-03 19:39:41 -08:00
Benoit Steiner
deea866bbd
Added tests to cover the new rounding, flooring and ceiling tensor operations.
2016-03-03 12:38:02 -08:00
Benoit Steiner
5cf4558c0a
Added support for rounding, flooring, and ceiling to the tensor api
2016-03-03 12:36:55 -08:00
Benoit Steiner
dac58d7c35
Added a test to validate the conversion of half floats into floats on Kepler GPUs.
...
Restricted the testing of the random number generation code to GPU architecture greater than or equal to 3.5.
2016-03-03 10:37:25 -08:00
Benoit Steiner
1032441c6f
Enable partial support for half floats on Kepler GPUs.
2016-03-03 10:34:20 -08:00
Benoit Steiner
1da10a7358
Enable the conversion between floats and half floats on older GPUs that support it.
2016-03-03 10:33:20 -08:00
Benoit Steiner
2de8cc9122
Merged in ebrevdo/eigen (pull request PR-167)
...
Add infinity() support to numext::numeric_limits, use it in lgamma.
I tested the code on my gtx-titan-black gpu, and it appears to work as expected.
2016-03-03 09:42:12 -08:00
Eugene Brevdo
ab3dc0b0fe
Small bugfix to numeric_limits for CUDA.
2016-03-02 21:48:46 -08:00
Eugene Brevdo
6afea46838
Add infinity() support to numext::numeric_limits, use it in lgamma.
...
This makes the infinity access a __device__ function, removing
nvcc warnings.
2016-03-02 21:35:48 -08:00
Gael Guennebaud
3fccef6f50
bug #537 : fix compilation with Apples's compiler
2016-03-02 13:22:46 +01:00
Benoit Steiner
fedaf19262
Pulled latest updates from trunk
2016-03-01 06:15:44 -08:00
Gael Guennebaud
dfa80b2060
Compilation fix
2016-03-01 12:48:56 +01:00
Gael Guennebaud
bee9efc203
Compilation fix
2016-03-01 12:47:27 +01:00
Benoit Steiner
68ac5c1738
Improved the performance of large outer reductions on cuda
2016-02-29 18:11:58 -08:00
Benoit Steiner
56a3ada670
Added benchmarks for full reduction
2016-02-29 14:57:52 -08:00
Benoit Steiner
b2075cb7a2
Made the signature of the inner and outer reducers consistent
2016-02-29 10:53:38 -08:00
Benoit Steiner
3284842045
Optimized the performance of narrow reductions on CUDA devices
2016-02-29 10:48:16 -08:00
Gael Guennebaud
e9bea614ec
Fix shortcoming in fixed-value deduction of startRow/startCol
2016-02-29 10:31:27 +01:00
Benoit Steiner
609b3337a7
Print some information to stderr when a CUDA kernel fails
2016-02-27 20:42:57 +00:00
Benoit Steiner
1031b31571
Improved the README
2016-02-27 20:22:04 +00:00
Gael Guennebaud
8e6faab51e
bug #1172 : make valuePtr and innderIndexPtr properly return null for empty matrices.
2016-02-27 14:55:40 +01:00
Benoit Steiner
ac2e6e0d03
Properly vectorized the random number generators
2016-02-26 13:52:24 -08:00
Benoit Steiner
caa54d888f
Made the TensorIndexList usable on GPU without having to use the -relaxed-constexpr compilation flag
2016-02-26 12:38:18 -08:00
Benoit Steiner
93485d86bc
Added benchmarks for type casting of float16
2016-02-26 12:24:58 -08:00
Benoit Steiner
002824e32d
Added benchmarks for fp16
2016-02-26 12:21:25 -08:00
Benoit Steiner
2cd32cad27
Reverted previous commit since it caused more problems than it solved
2016-02-26 13:21:44 +00:00
Benoit Steiner
d9d05dd96e
Fixed handling of long doubles on aarch64
2016-02-26 04:13:58 -08:00
Benoit Steiner
af199b4658
Made the CUDA architecture level a build setting.
2016-02-25 09:06:18 -08:00
Benoit Steiner
c36c09169e
Fixed a typo in the reduction code that could prevent large full reductionsx from running properly on old cuda devices.
2016-02-24 17:07:25 -08:00
Benoit Steiner
7a01cb8e4b
Marked the And and Or reducers as stateless.
2016-02-24 16:43:01 -08:00
Gael Guennebaud
91e1375ba9
merge
2016-02-23 11:09:05 +01:00
Gael Guennebaud
055000a424
Fix startRow()/startCol() for dense Block with direct access:
...
the initial implementation failed for empty rows/columns for which are ambiguous.
2016-02-23 11:07:59 +01:00
Benoit Steiner
1d9256f7db
Updated the padding code to work with half floats
2016-02-23 05:51:22 +00:00
Benoit Steiner
8cb9bfab87
Extended the tensor benchmark suite to support types other than floats
2016-02-23 05:28:02 +00:00
Benoit Steiner
f442a5a5b3
Updated the tensor benchmarking code to work with compilers that don't support cxx11.
2016-02-23 04:15:48 +00:00
Benoit Steiner
72d2cf642e
Deleted the coordinate based evaluation of tensor expressions, since it's hardly ever used and started to cause some issues with some versions of xcode.
2016-02-22 15:29:41 -08:00
Benoit Steiner
6270d851e3
Declare the half float type as arithmetic.
2016-02-22 13:59:33 -08:00
Benoit Steiner
5cd00068c0
include <iostream> in the tensor header since we now use it to better report cuda initialization errors
2016-02-22 13:59:03 -08:00
Benoit Steiner
257b640463
Fixed compilation warning generated by clang
2016-02-21 22:43:37 -08:00
Benoit Steiner
584832cb3c
Implemented the ptranspose function on half floats
2016-02-21 12:44:53 -08:00
Benoit Steiner
e644f60907
Pulled latest updates from trunk
2016-02-21 20:24:59 +00:00
Benoit Steiner
95fceb6452
Added the ability to compute the absolute value of a half float
2016-02-21 20:24:11 +00:00
Benoit Steiner
ed69cbeef0
Added some debugging information to the test to figure out why it fails sometimes
2016-02-21 11:20:20 -08:00
Benoit Steiner
96a24b05cc
Optimized casting of tensors in the case where the casting happens to be a no-op
2016-02-21 11:16:15 -08:00
Benoit Steiner
203490017f
Prevent unecessary Index to int conversions
2016-02-21 08:49:36 -08:00
Benoit Steiner
9ff269a1d3
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
2016-02-20 07:47:23 +00:00
Benoit Steiner
1e6fe6f046
Fixed the float16 tensor test.
2016-02-20 07:44:17 +00:00
Rasmus Munk Larsen
8eb127022b
Get rid of duplicate code.
2016-02-19 16:33:30 -08:00
Rasmus Munk Larsen
d5e2ec7447
Speed up tensor FFT by up ~25-50%.
...
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------
BM_tensor_fft_single_1D_cpu/8 132 134 -1.5%
BM_tensor_fft_single_1D_cpu/9 1162 1229 -5.8%
BM_tensor_fft_single_1D_cpu/16 199 195 +2.0%
BM_tensor_fft_single_1D_cpu/17 2587 2267 +12.4%
BM_tensor_fft_single_1D_cpu/32 373 341 +8.6%
BM_tensor_fft_single_1D_cpu/33 5922 4879 +17.6%
BM_tensor_fft_single_1D_cpu/64 797 675 +15.3%
BM_tensor_fft_single_1D_cpu/65 13580 10481 +22.8%
BM_tensor_fft_single_1D_cpu/128 1753 1375 +21.6%
BM_tensor_fft_single_1D_cpu/129 31426 22789 +27.5%
BM_tensor_fft_single_1D_cpu/256 4005 3008 +24.9%
BM_tensor_fft_single_1D_cpu/257 70910 49549 +30.1%
BM_tensor_fft_single_1D_cpu/512 8989 6524 +27.4%
BM_tensor_fft_single_1D_cpu/513 165402 107751 +34.9%
BM_tensor_fft_single_1D_cpu/999 198293 115909 +41.5%
BM_tensor_fft_single_1D_cpu/1ki 21289 14143 +33.6%
BM_tensor_fft_single_1D_cpu/1k 361980 233355 +35.5%
BM_tensor_fft_double_1D_cpu/8 138 131 +5.1%
BM_tensor_fft_double_1D_cpu/9 1253 1133 +9.6%
BM_tensor_fft_double_1D_cpu/16 218 200 +8.3%
BM_tensor_fft_double_1D_cpu/17 2770 2392 +13.6%
BM_tensor_fft_double_1D_cpu/32 406 368 +9.4%
BM_tensor_fft_double_1D_cpu/33 6418 5153 +19.7%
BM_tensor_fft_double_1D_cpu/64 856 728 +15.0%
BM_tensor_fft_double_1D_cpu/65 14666 11148 +24.0%
BM_tensor_fft_double_1D_cpu/128 1913 1502 +21.5%
BM_tensor_fft_double_1D_cpu/129 36414 24072 +33.9%
BM_tensor_fft_double_1D_cpu/256 4226 3216 +23.9%
BM_tensor_fft_double_1D_cpu/257 86638 52059 +39.9%
BM_tensor_fft_double_1D_cpu/512 9397 6939 +26.2%
BM_tensor_fft_double_1D_cpu/513 203208 114090 +43.9%
BM_tensor_fft_double_1D_cpu/999 237841 125583 +47.2%
BM_tensor_fft_double_1D_cpu/1ki 20921 15392 +26.4%
BM_tensor_fft_double_1D_cpu/1k 455183 250763 +44.9%
BM_tensor_fft_single_2D_cpu/8 1051 1005 +4.4%
BM_tensor_fft_single_2D_cpu/9 16784 14837 +11.6%
BM_tensor_fft_single_2D_cpu/16 4074 3772 +7.4%
BM_tensor_fft_single_2D_cpu/17 75802 63884 +15.7%
BM_tensor_fft_single_2D_cpu/32 20580 16931 +17.7%
BM_tensor_fft_single_2D_cpu/33 345798 278579 +19.4%
BM_tensor_fft_single_2D_cpu/64 97548 81237 +16.7%
BM_tensor_fft_single_2D_cpu/65 1592701 1227048 +23.0%
BM_tensor_fft_single_2D_cpu/128 472318 384303 +18.6%
BM_tensor_fft_single_2D_cpu/129 7038351 5445308 +22.6%
BM_tensor_fft_single_2D_cpu/256 2309474 1850969 +19.9%
BM_tensor_fft_single_2D_cpu/257 31849182 23797538 +25.3%
BM_tensor_fft_single_2D_cpu/512 10395194 8077499 +22.3%
BM_tensor_fft_single_2D_cpu/513 144053843 104242541 +27.6%
BM_tensor_fft_single_2D_cpu/999 279885833 208389718 +25.5%
BM_tensor_fft_single_2D_cpu/1ki 45967677 36070985 +21.5%
BM_tensor_fft_single_2D_cpu/1k 619727095 456489500 +26.3%
BM_tensor_fft_double_2D_cpu/8 1110 1016 +8.5%
BM_tensor_fft_double_2D_cpu/9 17957 15768 +12.2%
BM_tensor_fft_double_2D_cpu/16 4558 4000 +12.2%
BM_tensor_fft_double_2D_cpu/17 79237 66901 +15.6%
BM_tensor_fft_double_2D_cpu/32 21494 17699 +17.7%
BM_tensor_fft_double_2D_cpu/33 357962 290357 +18.9%
BM_tensor_fft_double_2D_cpu/64 105179 87435 +16.9%
BM_tensor_fft_double_2D_cpu/65 1617143 1288006 +20.4%
BM_tensor_fft_double_2D_cpu/128 512848 419397 +18.2%
BM_tensor_fft_double_2D_cpu/129 7271322 5636884 +22.5%
BM_tensor_fft_double_2D_cpu/256 2415529 1922032 +20.4%
BM_tensor_fft_double_2D_cpu/257 32517952 24462177 +24.8%
BM_tensor_fft_double_2D_cpu/512 10724898 8287617 +22.7%
BM_tensor_fft_double_2D_cpu/513 146007419 108603266 +25.6%
BM_tensor_fft_double_2D_cpu/999 296351330 221885776 +25.1%
BM_tensor_fft_double_2D_cpu/1ki 59334166 48357539 +18.5%
BM_tensor_fft_double_2D_cpu/1k 666660132 483840349 +27.4%
2016-02-19 16:29:23 -08:00
Gael Guennebaud
d90a2dac5e
merge
2016-02-19 23:01:27 +01:00
Gael Guennebaud
485823b5f5
Add COD and BDCSVD in list of benched solvers.
2016-02-19 23:00:33 +01:00
Gael Guennebaud
2af04f1a57
Extend unit test to stress smart_copy with empty input/output.
2016-02-19 22:59:28 +01:00
Gael Guennebaud
6fa35bbd28
bug #1170 : skip calls to memcpy/memmove for empty imput.
2016-02-19 22:58:52 +01:00
Benoit Steiner
46fc23f91c
Print an error message to stderr when the initialization of the CUDA runtime fails. This helps debugging setup issues.
2016-02-19 13:44:22 -08:00
Gael Guennebaud
6f0992c05b
Fix nesting type and complete reflection methods of Block expressions.
2016-02-19 22:21:02 +01:00
Gael Guennebaud
f3643eec57
Add typedefs for the return type of all block methods.
2016-02-19 22:15:01 +01:00
Benoit Steiner
670db7988d
Updated the contraction code to make it compatible with half floats.
2016-02-19 13:03:26 -08:00
Benoit Steiner
180156ba1a
Added support for tensor reductions on half floats
2016-02-19 10:05:59 -08:00
Benoit Steiner
5c4901b83a
Implemented the scalar division of 2 half floats
2016-02-19 10:03:19 -08:00
Benoit Steiner
f268db1c4b
Added the ability to query the minor version of a cuda device
2016-02-19 16:31:04 +00:00
Benoit Steiner
a08d2ff0c9
Started to work on contractions and reductions using half floats
2016-02-19 15:59:59 +00:00
Benoit Steiner
f3352e0fb0
Don't make the array constructors explicit
2016-02-19 15:58:57 +00:00
Benoit Steiner
f7cb755299
Added support for operators +=, -=, *= and /= on CUDA half floats
2016-02-19 15:57:26 +00:00
Benoit Steiner
dc26459b99
Implemented protate() for CUDA
2016-02-19 15:16:54 +00:00
Benoit Steiner
cd042dbbfd
Fixed a bug in the tensor type converter
2016-02-19 15:03:26 +00:00
Benoit Steiner
ac5d706a94
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
2016-02-19 08:19:12 +00:00
Benoit Steiner
0606a0a39b
FP16 on CUDA are only available starting with cuda 7.5. Disable them when using an older version of CUDA
2016-02-18 23:15:23 -08:00
Benoit Steiner
f36c0c2c65
Added regression test for float16
2016-02-19 06:23:28 +00:00
Benoit Steiner
7151bd8768
Reverted unintended changes introduced by a bad merge
2016-02-19 06:20:50 +00:00
Benoit Steiner
1304e1fb5e
Pulled latest updates from trunk
2016-02-19 06:17:02 +00:00
Benoit Steiner
17b9fbed34
Added preliminary support for half floats on CUDA GPU. For now we can simply convert floats into half floats and vice versa
2016-02-19 06:16:07 +00:00
Benoit Steiner
8ce46f9d89
Improved implementation of ptanh for SSE and AVX
2016-02-18 13:24:34 -08:00
Eugene Brevdo
832380c455
Merged eigen/eigen into default
2016-02-17 14:44:06 -08:00
Eugene Brevdo
06a2bc7c9c
Tiny bugfix in SpecialFunctions: some compilers don't like doubles
...
implicitly downcast to floats in an array constructor.
2016-02-17 14:41:59 -08:00
Gael Guennebaud
f6f057bb7d
bug #1166 : fix shortcomming in gemv when the destination is not a vector at compile-time.
2016-02-15 21:43:07 +01:00
Gael Guennebaud
8e1f1ba6a6
Import wiki's paragraph: "I disabled vectorization, but I'm still getting annoyed about alignment issues"
2016-02-12 22:16:59 +01:00
Gael Guennebaud
c8b4c4b48a
bug #795 : mention allocate_shared as a condidate for aligned_allocator.
2016-02-12 22:09:16 +01:00
Gael Guennebaud
6eff3e5185
Fix triangularView versus triangularPart.
2016-02-12 17:09:28 +01:00
Gael Guennebaud
4252af6897
Remove dead code.
2016-02-12 16:13:35 +01:00
Gael Guennebaud
2f5f56a820
Fix usage of evaluator in sparse * permutation products.
2016-02-12 16:13:16 +01:00
Gael Guennebaud
0a537cb2d8
bug #901 : fix triangular-view with unit diagonal of sparse rectangular matrices.
2016-02-12 15:58:31 +01:00
Gael Guennebaud
b35d1a122e
Fix unit test: accessing elements in a deque by offsetting a pointer to another element causes undefined behavior.
2016-02-12 15:31:16 +01:00
Benoit Steiner
9e3f3a2d27
Deleted outdated comment
2016-02-11 17:27:35 -08:00
Benoit Steiner
de345eff2e
Added a method to conjugate the content of a tensor or the result of a tensor expression.
2016-02-11 16:34:07 -08:00
Benoit Steiner
17e93ba148
Pulled latest updates from trunk
2016-02-11 15:05:38 -08:00
Benoit Steiner
3628f7655d
Made it possible to run the scalar_binary_pow_op functor on GPU
2016-02-11 15:05:03 -08:00
Hauke Heibel
eeac46f980
bug #774 : re-added comment referencing equations in the original paper
2016-02-11 19:38:37 +01:00
Benoit Steiner
c569cfe12a
Inline the +=, -=, *= and /= operators consistently between DenseBase.h and SelfCwiseBinaryOp.h
2016-02-11 09:33:32 -08:00
Gael Guennebaud
8cc9232b9a
bug #774 : fix a numerical issue producing unwanted reflections.
2016-02-11 15:32:56 +01:00
Gael Guennebaud
2d35c0cb5f
Merged in rmlarsen/eigen (pull request PR-163)
...
Implement complete orthogonal decomposition in Eigen.
2016-02-11 15:12:34 +01:00
Benoit Steiner
33e2373f01
Merged in nnyby/eigen/nnyby/doc-grammar-fix-linearly-space-linearly-1443742971203 (pull request PR-138)
...
[doc] grammar fix: "linearly space" -> "linearly spaced"
2016-02-10 23:29:59 -08:00
Benoit Steiner
6d8b1dce06
Avoid implicit cast from double to float.
2016-02-10 18:07:11 -08:00
Benoit Steiner
1dfaafe28a
Added a regression test for tanh
2016-02-10 17:41:47 -08:00
Rasmus Munk Larsen
b6fdf7468c
Rename inverse -> pseudoInverse.
2016-02-10 13:03:07 -08:00
Benoit Jacob
9d6f1ad398
I'm told to use __EMSCRIPTEN__ by an Emscripten dev.
2016-02-10 12:48:34 -05:00
Benoit Steiner
bfb3fcd94f
Optimized implementation of the tanh function for SSE
2016-02-10 08:52:30 -08:00
Benoit Steiner
2d523332b3
Optimized implementation of the hyperbolic tangent function for AVX
2016-02-10 08:48:05 -08:00
Benoit Jacob
e6ee18d6b4
Make the GCC workaround for sqrt GCC-only; detect Emscripten as non-GCC
2016-02-10 11:11:49 -05:00
Benoit Steiner
2ac59e5d36
Pulled latest updates from trunk
2016-02-10 08:03:02 -08:00
Benoit Steiner
9a21b38ccc
Worked around a few clang compilation warnings
2016-02-10 08:02:04 -08:00
Benoit Jacob
964a95bf5e
Work around Emscripten bug - https://github.com/kripken/emscripten/issues/4088
2016-02-10 10:37:22 -05:00
Benoit Steiner
72ab7879f7
Fixed clang comilation warnings
2016-02-10 06:48:28 -08:00
Benoit Steiner
e88535634d
Fixed some clang compilation warnings
2016-02-09 23:32:41 -08:00
Benoit Steiner
970751ece3
Disabling the nvcc warnings in addition to the clang warnings when clang is used as a frontend for nvcc
2016-02-09 20:55:50 -08:00
Benoit Steiner
6323851ea9
Fixed compilation warning
2016-02-09 20:43:41 -08:00
Rasmus Munk Larsen
bb8811c655
Enable inverse() method for computing pseudo-inverse.
2016-02-09 20:35:20 -08:00
Benoit Steiner
5cc0dd5f44
Fixed the code that disables the use of variadic templates when compiling with nvcc on ARM devices.
2016-02-09 10:32:01 -08:00
Benoit Steiner
a9cc6a06b9
Fixed compilation warning in the splines test
2016-02-09 05:10:06 +00:00
Benoit Steiner
d69946183d
Updated the TensorIntDivisor code to work properly on LLP64 systems
2016-02-08 21:03:59 -08:00
Benoit Steiner
24d291cf16
Worked around nvcc crash when compiling Eigen on Tegra X1
2016-02-09 02:34:02 +00:00
Rasmus Munk Larsen
53f60e0afc
Make applyZAdjointOnTheLeftInPlace protected.
2016-02-08 09:01:43 -08:00
Rasmus Munk Larsen
414efa47d3
Add missing calls to tests of COD.
...
Fix a few mistakes in 3.2 -> 3.3 port.
2016-02-08 08:50:34 -08:00
Gael Guennebaud
c2bf2f56ef
Remove custom unaligned loads for SSE. They were only useful for core2 CPU.
2016-02-08 14:29:12 +01:00
Gael Guennebaud
a4c76f8d34
Improve inlining
2016-02-08 11:33:02 +01:00
Rasmus Munk Larsen
16ec450ca1
Nevermind.
2016-02-06 17:54:01 -08:00
Rasmus Munk Larsen
019fff9a00
Add my name to copyright notice in ColPivHouseholder.h, mostly for previous work on stable norm downdate formula.
2016-02-06 17:48:42 -08:00
Rasmus Munk Larsen
86d6201d7b
Merge.
2016-02-06 16:36:56 -08:00
Rasmus Munk Larsen
d904c8ac8f
Implement complete orthogonal decomposition in Eigen.
2016-02-06 16:32:00 -08:00
Gael Guennebaud
010afe1619
Add exemples for reshaping/slicing with Map.
2016-02-06 22:49:18 +01:00
Gael Guennebaud
8e599bc098
Fix warning in unit test
2016-02-06 20:26:59 +01:00
Gael Guennebaud
c6a12d1dc6
Fix warning with gcc < 4.8
2016-02-06 18:06:51 +01:00
Benoit Steiner
4d4211c04e
Avoid unecessary type conversions
2016-02-05 18:19:41 -08:00
Benoit Steiner
d2cba52015
Only enable the cxx11_tensor_uint128 test on 64 bit machines since 32 bit systems don't support the __uin128_t type
2016-02-05 18:14:23 -08:00
Benoit Steiner
fb00a4af2b
Made the tensor fft test compile on tegra x1
2016-02-06 01:42:14 +00:00
Gael Guennebaud
5b2d287878
bug #779 : allow non aligned buffers for buffers smaller than the requested alignment.
2016-02-05 21:46:39 +01:00
Gael Guennebaud
e8e1d504d6
Add an explicit assersion on the alignment of the pointer returned by std::malloc
2016-02-05 21:38:16 +01:00
Gael Guennebaud
62a1c911cd
Remove posix_memalign, _mm_malloc, and _aligned_malloc special paths.
2016-02-05 21:24:35 +01:00
Rasmus Munk Larsen
093f2b3c01
Merge.
2016-02-04 14:32:19 -08:00
Benoit Steiner
3ca1ae2bb7
Commented out the version of pexp<Packet8d> since it fails to compile with gcc 5.3
2016-02-04 13:49:06 -08:00
Rasmus Munk Larsen
2e39cc40a4
Fix condition that made the unit test spam stdout with bogus error messages.
2016-02-04 12:56:14 -08:00
Benoit Steiner
23f69ab936
Added implementations of pexp, plog, psqrt, and prsqrt optimized for AVX512
2016-02-04 10:36:36 -08:00
Benoit Steiner
6c9cf117c1
Fixed indentation
2016-02-04 10:34:10 -08:00
Benoit Steiner
bcdcdace48
Pulled latest updates from trunk
2016-02-04 08:56:49 -08:00
Gael Guennebaud
659fc9c159
Remove dead code
2016-02-04 09:55:09 +01:00
Gael Guennebaud
d5d7798b9d
Improve heuritics for switching between coeff-based and general matrix product implementation.
2016-02-04 09:53:47 +01:00
Benoit Steiner
f535378995
Added support for vectorized type casting of int to char.
2016-02-03 18:58:29 -08:00
Benoit Steiner
4ab63a3f6f
Fixed the initialization of the dummy member of the array class to make it compatible with pairs of element.
2016-02-03 17:23:07 -08:00
Benoit Steiner
727ff26960
Disable 2 more nvcc warning messages
2016-02-03 16:01:37 -08:00
Benoit Steiner
1cbb79cdfd
Made sure the dummy element of size 0 array is always intialized to silence some compiler warnings
2016-02-03 15:58:26 -08:00
Benoit Steiner
bcbde37a11
Made sure the code compiles when EIGEN_HAS_C99_MATH isn't defined
2016-02-03 14:53:08 -08:00
Benoit Steiner
f933f69021
Added a few comments
2016-02-03 14:12:18 -08:00
Benoit Steiner
5d82e47ef6
Properly disable nvcc warning messages in user code.
2016-02-03 14:10:06 -08:00
Benoit Steiner
af8436b196
Silenced the "calling a __host__ function from a __host__ __device__ function is not allowed" messages
2016-02-03 13:48:36 -08:00
Benoit Steiner
d7742d22e4
Revert the nvcc messages to their default severity instead of the forcing them to be warnings
2016-02-03 13:47:28 -08:00
Benoit Steiner
ac26e1aaf3
Pulled latest updates from trunk
2016-02-03 12:52:20 -08:00
Benoit Steiner
492fe7ce02
Silenced some unhelpful warnings generated by nvcc.
2016-02-03 12:51:19 -08:00
Gael Guennebaud
b70db60e4d
Merged in rmlarsen/eigen (pull request PR-161)
...
Change Eigen's ColPivHouseholderQR to use numerically stable norm downdate formula
2016-02-03 21:37:06 +01:00
Rasmus Munk Larsen
5fb04ab2da
Fix bad line break. Don't repeat Kahan matrix test since it is deterministic.
2016-02-03 10:12:10 -08:00
Rasmus Munk Larsen
d9a6f86cc0
Make the array of directly compute column norms a member to avoid allocation in computeInPlace.
2016-02-03 09:55:30 -08:00
Gael Guennebaud
70dc14e4e1
bug #1161 : fix division by zero for huge scalar types
2016-02-03 18:25:41 +01:00
Damien R
c301f99208
bug #1164 : fix list and deque specializations such that our aligned allocator is automatically activatived only when the user did not specified an allocator (or specified the default std::allocator).
2016-02-03 18:07:25 +01:00
Gael Guennebaud
eb6d9aea0e
Clarify error message when writing to a read-only sparse-sub-matrix.
2016-02-03 16:58:23 +01:00
Gael Guennebaud
040cf33e8f
merge
2016-02-03 16:09:51 +01:00
Gael Guennebaud
c85fbfd0b7
Clarify documentation on the restrictions of writable sparse block expressions.
2016-02-03 16:08:43 +01:00
Benoit Steiner
dc413dbe8a
Merged in ville-k/eigen/explicit_long_constructors (pull request PR-158)
...
Add constructor for long types.
2016-02-02 20:58:06 -08:00
Ville Kallioniemi
783018d8f6
Use EIGEN_STATIC_ASSERT for backward compatibility.
2016-02-02 16:45:12 -07:00
Benoit Steiner
99cde88341
Don't try to use direct offsets when computing a tensor product, since the required stride isn't available.
2016-02-02 11:06:53 -08:00
Ville Kallioniemi
ff0a83aaf8
Use single template constructor to avoid overload resolution issues.
2016-02-02 00:33:25 -07:00
Ville Kallioniemi
aedea349aa
Replace separate low word constructors with a single templated constructor.
2016-02-01 20:25:02 -07:00
Ville Kallioniemi
f0fdefa96f
Rebase to latest.
2016-02-01 19:32:31 -07:00
Benoit Steiner
d93b71a301
Updated the packetmath test to call predux_half instead of predux4
2016-02-01 15:18:33 -08:00
Benoit Steiner
ef66f2887b
Updated the matrix multiplication code to make it compile with AVX512 enabled.
2016-02-01 14:38:05 -08:00
Benoit Steiner
85b6d82b49
Generalized predux4 to support AVX512 packets, and renamed it predux_half.
...
Disabled the implementation of pabs for avx512 since the corresponding intrinsics are not shipped with gcc
2016-02-01 14:35:51 -08:00
Benoit Steiner
64ce78c2ec
Cleaned up a tensor contraction test
2016-02-01 13:57:41 -08:00
Benoit Steiner
0ce5d32be5
Sharded the cxx11_tensor_contract_cuda test
2016-02-01 13:33:23 -08:00
Benoit Steiner
922b5f527b
Silenced a few compilation warnings
2016-02-01 13:30:49 -08:00
Benoit Steiner
6b5dff875e
Made it possible to limit the number of blocks that will be used to evaluate a tensor expression on a CUDA device. This makesit possible to set aside streaming multiprocessors for other computations.
2016-02-01 12:46:32 -08:00
Rasmus Munk Larsen
00f9ef6c76
merging.
2016-02-01 11:10:30 -08:00
Benoit Steiner
264f8141f8
Shared the tensor reduction test
2016-02-01 07:44:31 -08:00
Benoit Steiner
11bb71c8fc
Sharded the tensor device test
2016-02-01 07:34:59 -08:00
Gael Guennebaud
ff1157bcbf
bug #694 : document that SparseQR::matrixR is not sorted.
2016-02-01 16:09:34 +01:00
Gael Guennebaud
ec469700dc
bug #557 : make InnerIterator of sparse storage types more versatile by adding default-ctor, copy-ctor/assignment
2016-02-01 15:04:33 +01:00
Gael Guennebaud
6e0a86194c
Fix integer path for num_steps==1
2016-02-01 15:00:04 +01:00
Gael Guennebaud
e1d219e5c9
bug #698 : fix linspaced for integer types.
2016-02-01 14:25:34 +01:00
Gael Guennebaud
2c3224924b
Fix warning and replace min/max macros by calls to mini/maxi
2016-02-01 10:23:45 +01:00
Benoit Steiner
e80ed948e1
Fixed a number of compilation warnings generated by the cuda tests
2016-01-31 20:09:41 -08:00
Benoit Steiner
6720b38fbf
Fixed a few compilation warnings
2016-01-31 16:48:50 -08:00
Benoit Steiner
3f1ee45833
Fixed compilation errors triggered by duplicate inline declaration
2016-01-31 10:48:49 -08:00
Benoit Steiner
70be6f6531
Pulled latest changes from trunk
2016-01-31 10:44:45 -08:00
Benoit Steiner
4a2ddfb81d
Sharded the CUDA argmax tensor test
2016-01-31 10:44:15 -08:00
Gael Guennebaud
d142165942
bug #667 : declare several critical functions as FORECE_INLINE to make ICC happier.
...
<g.gael@free.fr > HG: branch 'default' HG: changed Eigen/src/Core/ArrayBase.h HG: changed Eigen/src/Core/AssignEvaluator.h HG: changed
Eigen/src/Core/CoreEvaluators.h HG: changed Eigen/src/Core/CwiseUnaryOp.h HG: changed Eigen/src/Core/DenseBase.h HG: changed Eigen/src/Core/MatrixBase.h
2016-01-31 16:34:10 +01:00
Gael Guennebaud
a4e4542b89
Avoid overflow in unit test.
2016-01-30 22:26:17 +01:00
Gael Guennebaud
3ba8a3ab1a
Disable underflow unit test on the i387 FPU.
2016-01-30 22:14:04 +01:00
Benoit Steiner
483082ef6e
Fixed a few memory leaks in the cuda tests
2016-01-30 11:59:22 -08:00
Benoit Steiner
bd21aba181
Sharded the cxx11_tensor_cuda test and fixed a memory leak
2016-01-30 11:47:09 -08:00
Benoit Steiner
9de155d153
Added a test to cover threaded tensor shuffling
2016-01-30 10:56:47 -08:00
Benoit Steiner
32088c06a1
Made the comparison between single and multithreaded contraction results more resistant to numerical noise to prevent spurious test failures.
2016-01-30 10:51:14 -08:00
Benoit Steiner
2053478c56
Made sure to use a tensor of rank 0 to store the result of a full reduction in the tensor thread pool test
2016-01-30 10:46:36 -08:00
Benoit Steiner
d0db95f730
Sharded the tensor thread pool test
2016-01-30 10:43:57 -08:00
Benoit Steiner
ba27c8a7de
Made the CUDA contract test more robust to numerical noise.
2016-01-30 10:28:43 -08:00
Benoit Steiner
4281eb1e2c
Added 2 benchmarks to the suite of tensor benchmarks running on GPU
2016-01-30 10:20:43 -08:00
Gael Guennebaud
102fa96a96
Extend doc on dense+sparse
2016-01-30 14:58:21 +01:00
Gael Guennebaud
1bc207c528
backout changeset d4a9e61569
...
: the extended SparseView is not needed anymore
2016-01-30 14:43:21 +01:00
Gael Guennebaud
8ed1553d20
bug #632 : implement general coefficient-wise "dense op sparse" operations through specialized evaluators instead of using SparseView.
...
This permits to deal with arbitrary storage order, and to by-pass the more complex iterator of the sparse-sparse case.
2016-01-30 14:39:50 +01:00
Gael Guennebaud
699634890a
bug #946 : generalize Cholmod::solve to handle any rhs expression
2016-01-29 23:02:22 +01:00
Gael Guennebaud
15084cf1ac
bug #632 : add support for "dense +/- sparse" operations. The current implementation is based on SparseView to make the dense subexpression compatible with the sparse one.
2016-01-29 22:09:45 +01:00
Gael Guennebaud
d4a9e61569
Extend SparseView to allow keeping explicit zeros. This is equivalent to sparseView(1,-1) but faster because the test is removed at compile-time.
2016-01-29 22:07:56 +01:00
Gael Guennebaud
d8d37349c3
bug #696 : enable zero-sized block at compile-time by relaxing the respective assertion
2016-01-29 12:44:49 +01:00
Gael Guennebaud
e8ccc06fe5
merge
2016-01-29 09:40:38 +01:00
Benoit Steiner
963f2d2a8f
Marked several methods EIGEN_DEVICE_FUNC
2016-01-28 23:37:48 -08:00
Benoit Steiner
c5d25bf1d0
Fixed a couple of compilation warnings.
2016-01-28 23:15:45 -08:00
Benoit Steiner
e4f83bae5d
Fixed the tensor benchmarks on apple devices
2016-01-28 21:08:07 -08:00
Benoit Steiner
10bea90c4a
Fixed clang related compilation error
2016-01-28 20:52:08 -08:00
Benoit Steiner
d3f533b395
Fixed compilation warning
2016-01-28 20:09:45 -08:00
Abhijit Kundu
3fde202215
Making ceil() functor generic w.r.t packet type
2016-01-28 21:27:00 -05:00
Benoit Steiner
211d350fc3
Fixed a typo
2016-01-28 17:13:04 -08:00
Benoit Steiner
bd2e5a788a
Made sure the number of floating point operations done by a benchmark is computed using 64 bit integers to avoid overflows.
2016-01-28 17:10:40 -08:00
Benoit Steiner
120e13b1b6
Added a readme to explain how to compile the tensor benchmarks.
2016-01-28 17:06:00 -08:00
Benoit Steiner
a68864b6bc
Updated the benchmarking code to print the number of flops processed instead of the number of bytes.
2016-01-28 16:51:40 -08:00
Benoit Steiner
8217281ae4
Merge latest updates from trunk
2016-01-28 16:20:53 -08:00
Benoit Steiner
c8d5f21941
Added extra tensor benchmarks
2016-01-28 16:20:36 -08:00
Benoit Steiner
7b3044d086
Made sure to call nvcc with the relaxed-constexpr flag.
2016-01-28 15:36:34 -08:00
Rasmus Munk Larsen
acce4dd050
Change Eigen's ColPivHouseholderQR to use the numerically stable norm downdate formula from http://www.netlib.org/lapack/lawnspdf/lawn176.pdf , which has been used in LAPACK's xGEQPF and xGEQP3 since 2006. With the old formula, the code chooses the wrong pivots and fails to correctly determine rank on graded matrices.
...
This change also adds additional checks for non-increasing diagonal in R11 to existing unit tests, and adds a new unit test with the Kahan matrix, which consistently fails for the original code.
Benchmark timings on Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz. Code compiled with AVX & FMA. I just ran on square matrices of 3 difference sizes.
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------------
Before:
BM_EigencolPivQR/64 53677 53627 12890
BM_EigencolPivQR/512 15265408 15250784 46
BM_EigencolPivQR/4k 15403556228 15388788368 2
After (non-vectorized version):
Benchmark Time(ns) CPU(ns) Iterations Degradation
--------------------------------------------------------------------
BM_EigencolPivQR/64 63736 63669 10844 18.5%
BM_EigencolPivQR/512 16052546 16037381 43 5.1%
BM_EigencolPivQR/4k 15149263620 15132025316 2 -2.0%
Performance-wise there seems to be a ~18.5% degradation for small (64x64) matrices, probably due to the cost of more O(min(m,n)^2) sqrt operations that are not needed for the unstable formula.
2016-01-28 15:07:26 -08:00
Gael Guennebaud
b908e071a8
bug #178 : get rid of some const_cast in SparseCore
2016-01-28 22:11:18 +01:00
Gael Guennebaud
c1d900af61
bug #178 : remove additional const on nested expression, and remove several const_cast.
2016-01-28 21:43:20 +01:00
Benoit Steiner
12f8bd12a2
Merged in jiayq/eigen (pull request PR-159)
...
Modifications to the tensor benchmarks to allow compilation in a standalone fashion.
2016-01-28 11:28:55 -08:00
Yangqing Jia
270c4e1ecd
bugfix
2016-01-28 11:11:45 -08:00
Yangqing Jia
c4e47630b1
benchmark modifications to make it compilable in a standalone fashion.
2016-01-28 10:35:14 -08:00
Gael Guennebaud
f50bb1e6f3
Fix compilation with gcc
2016-01-28 13:25:26 +01:00
Gael Guennebaud
ddf64babde
merge
2016-01-28 13:21:48 +01:00
Gael Guennebaud
df15fbc452
bug #1158 : PartialReduxExpr is a vector expression, and it thus must expose the LinearAccessBit flag
2016-01-28 13:16:30 +01:00
Gael Guennebaud
9bcadb7fd1
Disable stupid MSVC warning
2016-01-28 12:14:16 +01:00
Gael Guennebaud
b4d87fff4a
Fix MSVC warning.
2016-01-28 12:12:30 +01:00
Gael Guennebaud
2bad3e78d9
bug #96 , bug #1006 : fix by value argument in result_of.
2016-01-28 12:12:06 +01:00
Gael Guennebaud
7802a6bb1c
Fix unit test filename.
2016-01-28 09:35:37 +01:00
Benoit Steiner
4bf9eaf77a
Deleted an invalid assertion that prevented the assignment of empty tensors.
2016-01-27 17:09:30 -08:00
Benoit Steiner
291069e885
Fixed some compilation problems with nvcc + clang
2016-01-27 15:37:03 -08:00
Benoit Steiner
47ca9dc809
Fixed the tensor_cuda test
2016-01-27 14:58:48 -08:00
Benoit Steiner
55a5204319
Fixed the flags passed to nvcc to compile the tensor code.
2016-01-27 14:46:34 -08:00
Gael Guennebaud
4865e1e732
Update link to suitesparse.
2016-01-27 22:48:40 +01:00
Benoit Steiner
9dfbd4fe8d
Made the cuda tests compile using make check
2016-01-27 12:22:17 -08:00
Benoit Steiner
5973bcf939
Properly specify the namespace when calling cout/endl
2016-01-27 12:04:42 -08:00
Eugene Brevdo
c8d94ae944
digamma special function: merge shared code.
...
Moved type-specific code into a helper class digamma_impl_maybe_poly<Scalar>.
2016-01-27 09:52:29 -08:00
Gael Guennebaud
9c8f7dfe94
bug #1156 : fix several function declarations whose arguments were passed by value instead of being passed by reference
2016-01-27 18:34:42 +01:00
Gael Guennebaud
9aa6fae123
bug #1154 : move to dynamic scheduling for spmv products.
2016-01-27 18:03:51 +01:00
Gael Guennebaud
9ac8e8c6a1
Extend mixing type unit test with trmv, and the following not yet supported products: trmm, symv, symm
2016-01-27 17:29:53 +01:00
Gael Guennebaud
6da5d87f92
add nomalloc unit test for rank2 updates
2016-01-27 17:26:48 +01:00
Gael Guennebaud
9801c959e6
Fix tri = complex * real product, and add respective unit test.
2016-01-27 17:12:25 +01:00
Gael Guennebaud
21b5345782
Add meta_least_common_multiple helper.
2016-01-27 17:11:39 +01:00
Gael Guennebaud
fecea26d93
Extend doc on shifting strategy
2016-01-27 15:55:15 +01:00
Ville Kallioniemi
02db1228ed
Add constructor for long types.
2016-01-26 23:41:01 -07:00
Gael Guennebaud
412bb5a631
Remove redundant test.
2016-01-26 23:35:30 +01:00
Gael Guennebaud
0f8d26c6a9
Doc: add flip* and arrayfun MatLab equivalent.
2016-01-26 23:34:48 +01:00
Gael Guennebaud
cfa21f8123
Remove dead code.
2016-01-26 23:33:15 +01:00
Gael Guennebaud
6850eab33b
Re-enable blocking on rows in non-l3 blocking mode.
2016-01-26 23:32:48 +01:00
Gael Guennebaud
aa8c6a251e
Make sure that micro-panel-size is smaller than blocking sizes (otherwise we might get a buffer overflow)
2016-01-26 23:31:48 +01:00
Gael Guennebaud
5b0a9ee003
Make sure that block sizes are smaller than input matrix sizes.
2016-01-26 23:30:24 +01:00
Benoit Jacob
639b1d864a
bug #1152 : Fix data race in static initialization of blas
2016-01-26 11:44:16 -05:00
Christoph Hertzberg
44d4674955
bug #1153 : Don't rely on __GXX_EXPERIMENTAL_CXX0X__ to detect C++11 support
2016-01-26 16:45:33 +01:00
Hauke Heibel
5eb2790be0
Fixed minor typo in SplineFitting.
2016-01-25 22:17:52 +01:00
Gael Guennebaud
8328caa618
bug #51 : add block preallocation mechanism to selfadjoit*matrix product.
2016-01-25 22:06:42 +01:00
Gael Guennebaud
2f9e6314b1
update BLAS interface to general_matrix_matrix_triangular_product
2016-01-25 21:56:05 +01:00
Gael Guennebaud
e58827d2ed
bug #51 : make general_matrix_matrix_triangular_product use L3-blocking helper so that general symmetric rank-updates and general-matrix-to-triangular products do not trigger dynamic memory allocation for fixed size matrices.
2016-01-25 17:16:33 +01:00
Gael Guennebaud
c10021c00a
bug #1144 : clarify the doc about aliasing in case of resizing and matrix product.
2016-01-25 15:50:55 +01:00
Gael Guennebaud
b114e6fd3b
Improve documentation.
2016-01-25 11:56:25 +01:00
Gael Guennebaud
869b4443ac
Add SparseVector::conservativeResize() method.
2016-01-25 11:55:39 +01:00
Benoit Steiner
e3a15a03a4
Don't explicitely evaluate the subexpression from TensorForcedEval::evalSubExprIfNeeded, as it will be done when executing the EvalTo subexpression
2016-01-24 23:04:50 -08:00
Benoit Steiner
bd207ce11e
Added missing EIGEN_DEVICE_FUNC qualifier
2016-01-24 20:36:05 -08:00
Gael Guennebaud
acf6f7af6b
Merged in larsmans/eigen (pull request PR-156)
...
Documentation fixes
2016-01-24 22:28:49 +01:00
Lars Buitinck
cc482e32f1
Method is called visit, not visitor
2016-01-24 15:50:59 +01:00
Lars Buitinck
19e437daf0
Copyedit documentation: typos, spelling
2016-01-24 15:50:36 +01:00
Gael Guennebaud
1cf85bd875
bug #977 : add stableNormalize[d] methods: they are analogues to normalize[d] but with carefull handling of under/over-flow
2016-01-23 22:40:11 +01:00
Gael Guennebaud
369d6d1ae3
Add link to reference paper.
2016-01-23 22:16:03 +01:00
Gael Guennebaud
0caa4b1531
bug #1150 : make IncompleteCholesky more robust by iteratively increase the shift until the factorization succeed (with at most 10 attempts).
2016-01-23 22:13:54 +01:00
Benoit Steiner
cb4e53ff7f
Merged in ville-k/eigen/tensorflow_fix (pull request PR-153)
...
Add ctor for long
2016-01-22 19:11:31 -08:00
Ville Kallioniemi
9f94e030c1
Re-add executable flags to minimize changeset.
2016-01-22 20:08:45 -07:00
Benoit Steiner
3aeeca32af
Leverage the new blocking code in the tensor contraction code.
2016-01-22 16:36:30 -08:00
Benoit Steiner
4beb447e27
Created a mechanism to enable contraction mappers to determine the best blocking strategy.
2016-01-22 14:37:26 -08:00
Gael Guennebaud
5358c38589
bug #1095 : add Cholmod*::logDeterminant/determinant (from patch of Joshua Pritikin)
2016-01-22 16:05:29 +01:00
Gael Guennebaud
6a44ccb58b
Backout changeset 690bc950f7
2016-01-22 15:03:53 +01:00
Gael Guennebaud
06971223ef
Unify std::numeric_limits and device::numeric_limits within numext namespace
2016-01-22 15:02:21 +01:00
Ville Kallioniemi
9b6c72958a
Update to latest default branch
2016-01-21 23:08:54 -07:00
Ville Kallioniemi
73aec9219b
Make use of 32 bit ints explicit and remove executable bit from headers.
2016-01-21 23:00:32 -07:00
Benoit Steiner
7b68cf2e0f
Pulled latest updates from trunk
2016-01-21 17:17:56 -08:00
Benoit Steiner
c33479324c
Fixed a constness bug
2016-01-21 17:08:11 -08:00
Gael Guennebaud
ee37eb4eed
bug #977 : avoid division by 0 in normalize() and normalized().
2016-01-21 20:43:42 +01:00
Gael Guennebaud
7cae8918c0
Fix compilation on old gcc+AVX
2016-01-21 20:30:32 +01:00
Gael Guennebaud
8dca9f97e3
Add numext::sqrt function to enable custom optimized implementation.
...
This changeset add two specializations for float/double on SSE. Those
are mostly usefull with GCC for which std::sqrt add an extra and costly
check on the result of _mm_sqrt_*. Clang does not add this burden.
In this changeset, only DenseBase::norm() makes use of it.
2016-01-21 20:18:51 +01:00
Gael Guennebaud
34340458cb
bug #1151 : remove useless critical section
2016-01-21 14:29:45 +01:00
Jan Prach
690bc950f7
fix clang warnings
...
"braces around scalar initializer"
2016-01-20 19:35:59 -08:00
Benoit Steiner
f2a842294f
Pulled latest updates from the trunk
2016-01-20 18:12:53 -08:00
Benoit Steiner
7ce932edd3
Small cleanup and small fix to the contraction of row major tensors
2016-01-20 18:12:08 -08:00
Gael Guennebaud
62f7e77711
add upper|lower case in incomplete_cholesky unit test
2016-01-21 00:02:59 +01:00
Benoit Steiner
47076bf00e
Reduce the register pressure exerted by the tensor mappers whenever possible. This improves the performance of the contraction of a matrix with a vector by about 35%.
2016-01-20 14:51:48 -08:00
Benoit Steiner
ebd3388ee6
Pulled latest updates from trunk
2016-01-20 13:56:43 -08:00
Gael Guennebaud
ed8ade9c65
bug #1149 : fix Pastix*::*parm()
2016-01-20 19:01:24 +01:00
Gael Guennebaud
4c5e96aab6
bug #1148 : silent Pastix by default
2016-01-20 18:56:17 +01:00
Gael Guennebaud
db237d0c75
bug #1145 : fix PastixSupport LLT/LDLT wrappers (missing resize prior to calls to selfAdjointView)
2016-01-20 18:49:01 +01:00
Gael Guennebaud
0b7169d1f7
bug #1147 : fix compilation of PastixSupport
2016-01-20 18:15:59 +01:00
Gael Guennebaud
234a1094b7
Add static assertion to y(), z(), w() accessors
2016-01-20 09:18:44 +01:00
Ville Kallioniemi
915e7667cd
Remove executable bit from header files
2016-01-19 21:17:29 -07:00
Ville Kallioniemi
2832175a68
Use explicitly 32 bit integer types in constructors.
2016-01-19 20:12:17 -07:00
Benoit Steiner
df79c00901
Improved the formatting of the code
2016-01-19 17:24:08 -08:00
Benoit Steiner
6d472d8375
Moved the contraction mapping code to its own file to make the code more manageable.
2016-01-19 17:22:05 -08:00
Benoit Steiner
b3b722905f
Improved code indentation
2016-01-19 17:09:47 -08:00
Benoit Steiner
5b7713dd33
Record whether the underlying tensor storage can be accessed directly during the evaluation of an expression.
2016-01-19 17:05:10 -08:00
Ville Kallioniemi
63fb66f53a
Add ctor for long
2016-01-17 21:25:36 -07:00
Eugene Brevdo
6a75e7e0d5
Digamma cleanup
...
* Added permission from cephes author to use his code
* Cleanup in ArrayCwiseUnaryOps
2016-01-15 16:32:21 -08:00
Benoit Steiner
34057cff23
Fixed a race condition that could affect some reductions on CUDA devices.
2016-01-15 15:11:56 -08:00
Benoit Steiner
0461f0153e
Made it possible to compare tensor dimensions inside a CUDA kernel.
2016-01-15 11:22:16 -08:00
Benoit Steiner
aed4cb1269
Use warp shuffles instead of shared memory access to speedup the inner reduction kernel.
2016-01-14 21:45:14 -08:00
Benoit Steiner
c1a42c2d0d
Don't disable the AVX implementations of plset when compiling with AVX512 enabled
2016-01-14 17:21:39 -08:00
Benoit Steiner
0366478df8
Added alignment requirement to the AVX512 packet traits.
2016-01-14 17:02:39 -08:00
Benoit Steiner
3cfd16f3af
Fixed the signature of the plset primitives for AVX512
2016-01-14 16:58:01 -08:00
Benoit Steiner
67f44365ea
Fixed the AVX512 signature of the ptranspose primitives
2016-01-14 16:51:11 -08:00
Benoit Steiner
a282eb1363
pscatter/pgather use Index instead of int to specify the stride
2016-01-14 16:39:39 -08:00
Benoit Steiner
7832485575
Deleted unnecessary commas and semicolons
2016-01-14 16:36:29 -08:00
Benoit Steiner
8fe2532e70
Fixed a boundary condition bug in the outer reduction kernel
2016-01-14 09:29:48 -08:00
Benoit Steiner
9f013a9d86
Properly record the rank of reduced tensors in the tensor traits.
2016-01-13 14:24:37 -08:00
Benoit Steiner
79b69b7444
Trigger the optimized matrix vector path more conservatively.
2016-01-12 15:21:09 -08:00
Benoit Steiner
d920d57f38
Improved the performance of the contraction of a 2d tensor with a 1d tensor by a factor of 3 or more. This helps speedup LSTM neural networks.
2016-01-12 11:32:27 -08:00
Benoit Steiner
bd7d901da9
Reverted a previous change that tripped nvcc when compiling in debug mode.
2016-01-11 17:49:44 -08:00
Benoit Steiner
bbdabbb379
Made the blas utils usable from within a cuda kernel
2016-01-11 17:26:56 -08:00
Benoit Steiner
c5e6900400
Silenced a few compilation warnings.
2016-01-11 17:06:39 -08:00
Benoit Steiner
f894736d61
Updated the tensor traits: the alignment is not part of the Flags enum anymore
2016-01-11 16:42:18 -08:00
Benoit Steiner
4f7714d72c
Enabled the use of fixed dimensions from within a cuda kernel.
2016-01-11 16:01:00 -08:00
Benoit Steiner
01c55d37e6
Deleted unused variable.
2016-01-11 15:53:19 -08:00
Benoit Steiner
0504c56ea7
Silenced a nvcc compilation warning
2016-01-11 15:49:21 -08:00
Benoit Steiner
b523771a24
Silenced several compilation warnings triggered by nvcc.
2016-01-11 14:25:43 -08:00
Benoit Steiner
2c3b13eded
Merged in jeremy_barnes/eigen/shader-model-3.0 (pull request PR-152)
...
Alternative way of forcing instantiation of device kernels without causing warnings or requiring device to device kernel invocations.
2016-01-11 11:43:37 -08:00
Benoit Steiner
2ccb1c8634
Fixed a bug in the dispatch of optimized reduction kernels.
2016-01-11 10:36:37 -08:00
Benoit Steiner
780623261e
Re-enabled the optimized reduction CUDA code.
2016-01-11 09:07:14 -08:00
Jeremy Barnes
91678f489a
Cleaned up double-defined macro from last commit
2016-01-10 22:44:45 -05:00
Jeremy Barnes
403a7cb6c3
Alternative way of forcing instantiation of device kernels without
...
causing warnings or requiring device to device kernel invocations.
This allows Tensorflow to work on SM 3.0 (ie, Amazon EC2) machines.
2016-01-10 22:39:13 -05:00
Gael Guennebaud
b557662e58
merge
2016-01-09 08:37:01 +01:00
Gael Guennebaud
8b9dc9f0df
bug #1144 : fix regression in x=y+A*x (aliasing), and move evaluator_traits::AssumeAliasing to evaluator_assume_aliasing.
2016-01-09 08:30:38 +01:00
Benoit Steiner
e76904af1b
Simplified the dispatch code.
2016-01-08 16:50:57 -08:00
Benoit Steiner
d726e864ac
Made it possible to use array of size 0 on CUDA devices
2016-01-08 16:38:14 -08:00
Benoit Steiner
3358dfd5dd
Reworked the dispatch of optimized cuda reduction kernels to workaround a nvcc bug that prevented the code from compiling in optimized mode in some cases
2016-01-08 16:28:53 -08:00
Benoit Steiner
53749ff415
Prevent nvcc from miscompiling the cuda metakernel. Unfortunately this reintroduces some compulation warnings but it's much better than having to deal with random assertion failures.
2016-01-08 13:53:40 -08:00
Gael Guennebaud
f9d71a1729
extend matlab conversion table
2016-01-08 22:24:45 +01:00
Benoit Steiner
6639b7d6e8
Removed a couple of partial specialization that confuse nvcc and result in errors such as this:
...
error: more than one partial specialization matches the template argument list of class "Eigen::internal::get<3, Eigen::internal::numeric_list<std::size_t, 1UL, 1UL, 1UL, 1UL>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, a, as...>>"
"Eigen::internal::get<n, Eigen::internal::numeric_list<T, as...>>"
2016-01-07 18:45:19 -08:00
Benoit Steiner
0cb2ca5de2
Fixed a typo.
2016-01-06 18:50:28 -08:00
Benoit Steiner
213459d818
Optimized the performance of broadcasting of scalars.
2016-01-06 18:47:45 -08:00
Gael Guennebaud
ee738321aa
rm remaining debug code
2016-01-06 14:49:40 +01:00
Christoph Hertzberg
54bf582303
bug #1143 : Work-around gcc bug
2016-01-06 11:59:24 +01:00
Benoit Steiner
99093c0fe0
Added support for AVX512 to the build files
2016-01-05 10:02:49 -08:00
Benoit Steiner
cfff40b1d4
Improved the performance of reductions on CUDA devices
2016-01-04 17:25:00 -08:00
Benoit Steiner
515dee0baf
Added a 'divup' util to compute the floor of the quotient of two integers
2016-01-04 16:29:26 -08:00
Gael Guennebaud
715f6f049f
Improve inline documentation of SparseCompressedBase and its derived classes
2016-01-03 21:56:30 +01:00
Gael Guennebaud
8b0d1eb0f7
Fix numerous doxygen shortcomings, and workaround some clang -Wdocumentation warnings
2016-01-01 21:45:06 +01:00
Gael Guennebaud
9900782e88
Mark AlignedBit and EvalBeforeNestingBit with deprecated attribute, and remove the remaining usages of EvalBeforeNestingBit.
2015-12-30 16:47:49 +01:00
Gael Guennebaud
70404e07c2
Workaround clang -Wdocumentation warning about "/*<"
2015-12-30 16:46:45 +01:00
Gael Guennebaud
addb7066e8
Workaround "empty paragraph" warning with clang -Wdocumentation
2015-12-30 16:45:44 +01:00
Gael Guennebaud
eadc377b3f
Add missing doc of Derived template parameter
2015-12-30 16:43:19 +01:00
Gael Guennebaud
29bb599e03
Fix numerous doxygen issues in auto-link generation
2015-12-30 16:04:24 +01:00
Gael Guennebaud
162ccb2938
Fix links to Eigen2-to-Eigen3 porting helpers
2015-12-30 16:03:14 +01:00
Gael Guennebaud
5fae3750b5
Recent versions of doxygen miss-parsed Eigen/* headers
2015-12-30 16:02:05 +01:00
Gael Guennebaud
b84cefe61d
Add missing snippets for erf/erfc/lgamma functions.
2015-12-30 15:12:15 +01:00
Gael Guennebaud
16dd82ed51
Add missing snippet for sign/cwiseSign functions.
2015-12-30 15:11:42 +01:00
Gael Guennebaud
978c379ed7
Add missing ctor from uint
2015-12-30 12:52:38 +01:00
Gael Guennebaud
25f2b8d824
bug #1141 : add missing initialization of CholmodBase::m_*IsOk
2015-12-29 15:50:11 +01:00
Eugene Brevdo
f2471f31e0
Modify constants in SpecialFunctions to lowercase (avoid name conflicts).
2015-12-28 17:48:38 -08:00
Eugene Brevdo
afb35385bf
Change PI* to M_PI* in SpecialFunctions to avoid possible breakage
...
with external DEFINEs.
2015-12-28 17:34:06 -08:00
Eugene Brevdo
14897600b7
Protect digamma tests behind a EIGEN_HAS_C99_MATH check.
2015-12-24 21:28:18 -08:00
Eugene Brevdo
cef81c9084
Merged eigen/eigen into default
2015-12-24 21:17:33 -08:00
Eugene Brevdo
f7362772e3
Add digamma for CPU + CUDA. Includes tests.
2015-12-24 21:15:38 -08:00
Gael Guennebaud
d2e288ae50
Workaround compilers that do not even define _mm256_set_m128.
2015-12-24 16:53:43 +01:00
Benoit Steiner
bdcbc66a5c
Don't attempt to vectorize mean reductions of integers since we can't use
...
SSE or AVX instructions to divide 2 integers.
2015-12-22 17:51:55 -08:00
Benoit Steiner
a1e08fb2a5
Optimized the configuration of the outer reduction cuda kernel
2015-12-22 16:30:10 -08:00
Benoit Steiner
9c7d96697b
Added missing define
2015-12-22 16:11:07 -08:00
Benoit Steiner
e7e6d01810
Made sure the optimized gpu reduction code is actually compiled.
2015-12-22 15:07:33 -08:00
Benoit Steiner
b5d2078c4a
Optimized outer reduction on GPUs.
2015-12-22 15:06:17 -08:00
Benoit Steiner
3504ae47ca
Made it possible to run the lgamma, erf, and erfc functors on a CUDA gpu.
2015-12-21 15:20:06 -08:00
Benoit Steiner
1c3e78319d
Added missing const
2015-12-21 15:05:01 -08:00
Benoit Steiner
9f9d8d2f62
Disabled part of the matrix matrix peeling code that's incompatible with 512 bit registers
2015-12-21 13:04:52 -08:00
Benoit Steiner
b74887d5f2
Implemented most of the packet primitives for AVX512
2015-12-21 11:46:36 -08:00
Benoit Steiner
6ffb208c77
Make sure EIGEN_HAS_MM_MALLOC is set to 1 when using the avx512 instruction set.
2015-12-21 11:23:15 -08:00
Benoit Steiner
994d1c60b9
Free memory allocated using posix_memalign() with free() instead of std::free()
2015-12-21 11:21:39 -08:00
Benoit Steiner
b407948a77
Merged in connor-k/eigen (pull request PR-149)
...
[doc] Remove extra ';' in Advanced Initialization sample
2015-12-21 09:44:25 -08:00
Benoit Steiner
a6c243617b
Fixed a typo in previous change.
2015-12-21 09:05:45 -08:00
Benoit Steiner
51be91f15e
Added support for CUDA architectures that don's support for 3.5 capabilities
2015-12-21 08:42:58 -08:00
connor-k
95dd423cca
[doc] Remove extra ';' in Tutorial_AdvancedInitialization_Join.cpp
2015-12-21 01:12:26 +00:00
Tal Hadad
c006ecace1
Fix comments
2015-12-20 20:07:06 +02:00
Tal Hadad
bfed274df3
Use RotationBase, test quaternions and support ranges.
2015-12-20 16:24:53 +02:00
Tal Hadad
b091b7e6ea
Remove unneccesary comment.
2015-12-20 13:00:07 +02:00
Tal Hadad
fabd8474ff
Merged eigen/eigen into default
2015-12-20 12:50:07 +02:00
Tal Hadad
6752a69aa5
Much better tests, and a little bit more functionality.
2015-12-20 12:49:12 +02:00
Benoit Steiner
6d777e1bc7
Fixed a typo.
2015-12-18 19:25:50 -08:00
Benoit Steiner
1b82969559
Add alignment requirement for local buffer used by the slicing op.
2015-12-18 14:36:35 -08:00
Benoit Steiner
75a7fa1919
Doubled the speed of full reductions on GPUs.
2015-12-18 14:07:31 -08:00
Gael Guennebaud
3abd8470ca
bug #1140 : remove custom definition and use of _mm256_setr_m128
2015-12-18 14:18:59 +01:00
Benoit Steiner
8dd17cbe80
Fixed a clang compilation warning triggered by the use of arrays of size 0.
2015-12-17 14:00:33 -08:00
Benoit Steiner
4aac55f684
Silenced some compilation warnings triggered by nvcc
2015-12-17 13:39:01 -08:00
Benoit Steiner
40e6250fc3
Made it possible to run tensor chipping operations on CUDA devices
2015-12-17 13:29:08 -08:00
Benoit Steiner
2ca55a3ae4
Fixed some compilation error triggered by the tensor code with msvc 2008
2015-12-16 20:45:58 -08:00
Gael Guennebaud
55aef139ff
Added tag 3.3-beta1 for changeset 9f9de1aaa9
2015-12-16 21:49:02 +01:00
Benoit Steiner
b8861b0c25
Make sure the data is aligned on a 64 byte boundary when using avx512 instructions.
2015-12-11 09:19:57 -08:00
Benoit Steiner
9a415fb1e2
Preliminary support for AVX512
2015-12-10 15:34:57 -08:00
nnyby
ccc7b0ffea
[doc] grammar fix: "linearly space" -> "linearly spaced"
2015-10-01 23:43:06 +00:00
Tal Hadad
5e0a178df2
Initial fork of unsupported module EulerAngles.
2015-09-27 16:51:24 +03:00
yoco
15f273b63c
fix reshape flag and test case
2014-02-10 22:49:13 +08:00
yoco
b64a09acc1
fix reshape's Max[Row/Col]AtCompileTime
2014-02-04 05:54:50 +08:00
yoco
f8ad87f226
Reshape always non-directly-access
2014-02-04 05:19:56 +08:00
yoco
515bbf8bb2
Improve reshape test case
...
- simplify test code
- add reshape chain
2014-02-04 02:50:23 +08:00
yoco
009047db27
Fix Reshape traits flag calculate bug
2014-02-04 02:21:41 +08:00
yoco
2b89080903
Remove reshape InnerPanel, add test, fix bug
2014-01-20 01:43:28 +08:00
yoco
03723abda0
Remove useless reshape row/col ctor
2014-01-20 00:22:16 +08:00
yoco
342c8e5321
Fix Reshape DirectAccessBit bug
2014-01-20 00:15:19 +08:00
yoco
24e1c0f2a1
add reshape test for const and static-size matrix
2014-01-18 23:27:53 +08:00
yoco
150796337a
Add unit-test for reshape
...
- add unittest for dynamic & fixed-size reshape
- fix fixed-size reshape bugs bugs found while testing
2014-01-18 16:10:46 +08:00
yoco
497a7b0ce1
remove c++11, make c++03 compatible
2014-01-18 13:20:14 +08:00
yoco
9c832fad60
add reshape() snippets
2014-01-18 04:53:46 +08:00
yoco
1e1d0c15b5
add example code for Reshape class
2014-01-18 04:43:29 +08:00
yoco
fe2ad0647a
reshape now supported
...
- add member function to plugin
- add forward declaration
- add documentation
- add include
2014-01-18 04:21:20 +08:00
yoco
7bd58ad0b6
add Eigen/src/Core/Reshape.h
2014-01-18 04:16:44 +08:00