Compare commits

...

4385 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
6a9405bf7a GPU: Raise CUDA/HIP minimum and remove legacy guards
- Raise CUDA minimum from 9.0 to 11.4 (sm_70/Volta).
- Raise HIP minimum to GFX906 (Vega 20/MI50) / ROCm 5.6.
- Remove EIGEN_HAS_{CUDA,HIP,GPU}_FP16 guards — FP16 is always available
  on sm_70+ and GFX906+.
- Remove obsolete __HIP_ARCH_HAS_* preprocessor branches.
- C++14 cleanup: remove pre-C++14 workarounds in GPU code.
- Fix NVCC warnings (deprecated register keyword, unreachable code,
  tautological comparisons).
- Fix HIP test execution on gfx1151.
- Update CI configuration for new minimum versions.
2026-04-09 15:21:39 -07:00
Rasmus Munk Larsen
e055e4e415 Add plog_core_double with fallback for AVX without AVX2
libeigen/eigen!2407

Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com>
2026-04-08 19:41:07 -07:00
Rasmus Munk Larsen
b1d2ce4c85 Revert "Speed up plog_double ~1.7x with fast integer range reduction"
This reverts merge request !2385
2026-04-08 13:03:48 -07:00
Rasmus Munk Larsen
ab70739c9c Speed up plog_double ~1.7x with fast integer range reduction
libeigen/eigen!2385

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:48:25 -07:00
Rasmus Munk Larsen
e778b5d22b Switch ASAN/UBSAN smoketest pipelines to large runners
libeigen/eigen!2405

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:37:58 -07:00
Rasmus Munk Larsen
def45c5e1e Improve psincos_double: faster polynomials + accurate range reduction
libeigen/eigen!2389

Closes #3052

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:24:24 -07:00
Rasmus Munk Larsen
110530a4d8 Fix bugs and improve robustness of SelfAdjointEigenSolver, improve test coverage
libeigen/eigen!2396

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:08:29 -07:00
Rasmus Munk Larsen
bde3a68bae Improve dense linear solver docs with practical guidance
libeigen/eigen!2395

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-05 21:40:42 -07:00
Rasmus Munk Larsen
8eabfb5342 Vectorize BLAS level 1/2 routines with Eigen expressions
libeigen/eigen!2404

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-05 18:53:11 -07:00
Rasmus Munk Larsen
4ad90a60f1 Replace blas/f2c with clean C++ implementations
libeigen/eigen!2402

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-05 16:04:41 -07:00
Rasmus Munk Larsen
fe6ada10be Prevent nightly CI pipelines from being auto-cancelled
libeigen/eigen!2390

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-04 11:52:15 -07:00
Alexander Grund
8179474225 CI: Add AVX512-FP16 build tests with GCC 13
libeigen/eigen!1652

Co-authored-by: Alexander Grund <alexander.grund@tu-dresden.de>
2026-04-04 11:32:31 -07:00
Florian Maurin
b57d860f3e Fix GCC maybe-uninitialized warning in InnerProduct
libeigen/eigen!2386

Closes #3015
2026-04-03 19:41:09 -07:00
Rasmus Munk Larsen
a3074053a6 Speed up pexp_double by ~15-17%
libeigen/eigen!2388

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-03 17:09:11 -07:00
Rasmus Munk Larsen
a91913e961 Speed up plog_float by 1.6x with improved accuracy
libeigen/eigen!2382

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-03 13:45:01 -07:00
Rasmus Munk Larsen
ebae0c7c10 ulp_accuracy: use dynamic work queue for thread load balancing
libeigen/eigen!2383

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-02 22:40:03 -07:00
Charles Schlosser
5977635d64 fix singed integer overflow UB in integer_types and other trivial compiler warnings
libeigen/eigen!2380
2026-04-03 03:36:28 +00:00
Rasmus Munk Larsen
60df12437e Fix ulp_accuracy crashes in Release builds
libeigen/eigen!2381

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-02 20:12:13 -07:00
Pavel Guzenfeld
e315a8cdd0 Inline IndexedViewMethods.inc into DenseBase.h
libeigen/eigen!2330

Closes #2766
2026-04-02 15:26:56 -07:00
Rasmus Munk Larsen
8ec68856a6 Fix basicstuff_8 casting test failure on loongarch64
libeigen/eigen!2379

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-02 14:14:54 -07:00
Rasmus Munk Larsen
61a8662876 Improve log1p accuracy and speed with direct range reduction
libeigen/eigen!2378

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-02 11:29:25 -07:00
Rasmus Munk Larsen
d31a73437f Vectorize asinh and acosh for float and double
libeigen/eigen!2376

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 21:46:36 -07:00
Rasmus Munk Larsen
9513d3878e Vectorize sinh, cosh, and log10
libeigen/eigen!2368

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 20:41:18 -07:00
Rasmus Munk Larsen
30e669cfe1 Tensor module: const-correctness and constexpr improvements
libeigen/eigen!2239

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 17:49:56 -07:00
Rasmus Munk Larsen
64885cc6a3 Fix remaining MSVC warnings in Windows CI (C4804, C4244, C4146, C4305)
libeigen/eigen!2374

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 17:20:31 -07:00
Rasmus Munk Larsen
6a07970d7d CI: split NVHPC build and make fallback parallelism configurable
libeigen/eigen!2372

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 16:43:33 -07:00
Rasmus Munk Larsen
4be66f2830 CI: fail test jobs when no tests are found (--no-tests=error)
libeigen/eigen!2373

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 12:50:54 -07:00
Rasmus Munk Larsen
1df89cbc21 Right-size CI runners to reduce waste and shuffle build order to avoid OOM
libeigen/eigen!2367

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 19:10:34 -07:00
Rasmus Munk Larsen
b54640df19 Fix NVHPC warnings in Visitor.h and Memory.h
libeigen/eigen!2370

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 15:09:37 -07:00
Rasmus Munk Larsen
7fcbed7acb Fill packet math coverage gaps across multiple architectures
libeigen/eigen!2237

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 14:52:32 -07:00
Rasmus Munk Larsen
80ab2898e2 CI: install libclang-rt-14-dev for sanitizer smoketest
libeigen/eigen!2369

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 00:16:18 -07:00
Rasmus Munk Larsen
798d7f2bec CI: drop Clang-6, bump base image to Ubuntu 24.04 and Clang 12 to 14
libeigen/eigen!2366

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-30 22:00:17 -07:00
Rasmus Munk Larsen
1ade3636b9 Fix BDCSVD bidiagonal hard-case failures on ARM with GCC
libeigen/eigen!2365

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-30 20:17:37 -07:00
Rasmus Munk Larsen
801a9ee690 Fix ~1,460 MSVC warnings from generic code instantiated with bool
libeigen/eigen!2364

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 21:05:49 -07:00
Rasmus Munk Larsen
806c7b6590 CI: fix Windows build cache key containing invalid path characters
libeigen/eigen!2362

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 19:57:45 -07:00
Rasmus Munk Larsen
2776ba55eb Update slicing tutorial docs to reflect Eigen::placeholders namespace
libeigen/eigen!2360

Closes #3064

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 18:51:02 -07:00
Rasmus Munk Larsen
09581fda38 Modernize tensor contraction code: bug fixes, dead code removal, and cleanup
libeigen/eigen!2248

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 18:03:06 -07:00
Rasmus Munk Larsen
732ebc8cc2 Modernize evaluator files
libeigen/eigen!2245

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 17:40:39 -07:00
Rasmus Munk Larsen
255f522e2e Fix bugs, docs, and structure in unsupported/ public headers
libeigen/eigen!2254

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 17:06:40 -07:00
Pavel Guzenfeld
bd276fbb28 Map .inc files to C++ in Doxygen extension mapping
libeigen/eigen!2338
2026-03-29 16:48:13 -07:00
Rasmus Munk Larsen
c8633ceeea Clean up top-level Eigen headers
libeigen/eigen!2252

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 16:28:09 -07:00
Rasmus Munk Larsen
409296d91d Add nightly benchmark regression detection pipeline
libeigen/eigen!2349

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 16:03:56 -07:00
Pavel Guzenfeld
753a6ac5b3 Fix private shadowing of protected base members in iterative solvers
libeigen/eigen!2357

Closes #1859
2026-03-29 15:40:48 -07:00
Rasmus Munk Larsen
9fe2f03fa4 Revert "Lower BDCSVD crossover threshold from 16 to 8"
This reverts merge request !2358
2026-03-29 15:25:09 -07:00
Rasmus Munk Larsen
12fe90db8b Lower BDCSVD crossover threshold from 16 to 8
libeigen/eigen!2358

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 14:33:22 -07:00
Pavel Guzenfeld
b7f6aed1b9 Fix dangling reference in IndexedView with expression indices
libeigen/eigen!2335

Closes #1943
2026-03-29 09:39:13 -07:00
Rasmus Munk Larsen
624ab58e8d Add bidiagonal SVD API to BDCSVD and remove dead debug code
libeigen/eigen!2238

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 20:38:31 -07:00
Charles Schlosser
ba9871e46b fix and enable realview unit tests
libeigen/eigen!2356
2026-03-28 20:13:54 -07:00
Rasmus Munk Larsen
b8dab89663 CI: remove broken NVHPC CUDA pipeline
libeigen/eigen!2355

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 19:52:01 -07:00
Rasmus Munk Larsen
0fe8cdfa3b Extract RankRevealingBase CRTP mixin to eliminate decomposition code duplication
libeigen/eigen!2272

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 19:12:23 -07:00
Rasmus Munk Larsen
5e521f3e45 Revert "add realview test"
This reverts merge request !2352
2026-03-28 17:27:01 -07:00
Charles Schlosser
87ae1dbe7f add realview test
libeigen/eigen!2352
2026-03-28 16:26:51 -07:00
Rasmus Munk Larsen
49a137ca24 CI: limit NVHPC build parallelism to avoid OOM kills
libeigen/eigen!2353

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 16:10:04 -07:00
Rasmus Munk Larsen
f928a9f534 Fix static alignment for generic clang vector backend
libeigen/eigen!2351

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 15:50:58 -07:00
Rasmus Munk Larsen
9706546a14 Add Householder blocked-right regression test
libeigen/eigen!2348

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-27 20:49:43 -07:00
Pavel Guzenfeld
90ca5bfd9a Strip lapacke.h to only the declarations used by Eigen
libeigen/eigen!2322

Closes #2851
2026-03-27 20:16:46 -07:00
Rasmus Munk Larsen
cf508c096b Add block Householder right-side application for HouseholderSequence
libeigen/eigen!2342

Closes #3057

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-27 19:56:08 -07:00
Rasmus Munk Larsen
79d7d280a5 Fix bugs in evaluator files
libeigen/eigen!2244

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 01:25:51 +00:00
Rasmus Munk Larsen
b8baa2c49c Split eigensolver_selfadjoint test to fix NVHPC OOM
libeigen/eigen!2347

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-27 18:09:26 -07:00
Charles Schlosser
eb4b2eeffa UBSAN: use appropriate SSE intrinsics for loading 4 and 8 bytes
libeigen/eigen!2346
2026-03-27 19:54:10 +00:00
Tyler Veness
9939a4c6e3 Fix SparseLU and SparseQR for custom scalar types
libeigen/eigen!2345
2026-03-27 00:13:11 -07:00
Rasmus Munk Larsen
002229ce47 Fix RowMajor gemm_pack_lhs for backends without half/quarter packets
libeigen/eigen!2344

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-23 23:33:42 -07:00
Rasmus Munk Larsen
f574cb9b18 Fix vectorization_logic test for generic clang backend
libeigen/eigen!2333

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 22:12:57 -07:00
Rasmus Munk Larsen
843ffcec8b Fix warnings reported by NVHPC 26.1
libeigen/eigen!2324

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 11:43:40 -07:00
Florian Maurin
71ef987edb Fixes triangular solves on indexed/sliced dense expressions
libeigen/eigen!2340

Closes #2814
2026-03-22 11:12:21 -07:00
Rasmus Munk Larsen
ac6aedc60a Fix flaky matrix_power test
libeigen/eigen!2325

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 09:54:32 -07:00
Rasmus Munk Larsen
6490b17e6f Fix sanitizer regressions in sparse serializer and packet tests
libeigen/eigen!2319

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 09:10:16 -07:00
Pavel Guzenfeld
835e5615a9 Prefer SuiteSparse config-mode packages in Find modules
libeigen/eigen!2327
2026-03-22 08:44:01 -07:00
Rasmus Munk Larsen
f5774b014e Fix Doxygen build failure for comparison operator links
libeigen/eigen!2339

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 18:48:07 -07:00
Pavel Guzenfeld
a0e30732a7 Remove trailing semicolon from EIGEN_UNUSED_VARIABLE macro
libeigen/eigen!2301

Closes #3007

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-21 16:54:13 -07:00
Rasmus Munk Larsen
e0b8498eef CI: Add nightly clang C++20 full test pipeline
libeigen/eigen!2328

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com>
2026-03-21 11:10:58 -07:00
Rasmus Munk Larsen
7e8a3040bb Fix Doxygen errors for ArrayBase comparison operators
libeigen/eigen!2334

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 10:37:17 -07:00
Rasmus Munk Larsen
54b04fc6b1 Fix mixed-type GEMM packing for backends without half/quarter packets
libeigen/eigen!2297

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 09:46:54 -07:00
Pavel Guzenfeld
1d21d62fbc Fix computeInverseAndDetWithCheck for dynamic result matrices
libeigen/eigen!2312

Closes #2917
2026-03-21 08:38:27 -07:00
Rasmus Munk Larsen
cc8c7cf0e6 Fix bugs and clean up SparseCore module
libeigen/eigen!2250

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 13:31:12 +00:00
Pavel Guzenfeld
daecd28cd5 Add Array relational operator docs and FetchContent CMake guide
libeigen/eigen!2329

Closes #2801 and #2793
2026-03-20 18:50:58 -07:00
Rasmus Munk Larsen
9d1e5f3915 Remove benchmark::internal::Benchmark* from all benchmarks
libeigen/eigen!2332

Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com>
2026-03-20 17:42:07 -07:00
Rasmus Munk Larsen
8115b45e50 Fix integer sanitizer issues in shifts and test ranges
libeigen/eigen!2320

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 17:27:02 -07:00
Rasmus Munk Larsen
89621d1024 CI: Remove GCC 6 pipeline
libeigen/eigen!2323

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 17:09:19 -07:00
Rasmus Munk Larsen
6540bf4787 Harden unsupported tensor tests for sanitizers
libeigen/eigen!2321

Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com>
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 15:12:41 -07:00
Yu You
9d161e0c87 Fine-tune gebp_kernel for aarch64
libeigen/eigen!2278
2026-03-20 14:29:03 -07:00
Rasmus Munk Larsen
a0b16a7e1b Fix flaky product and eigensolver_selfadjoint tests
libeigen/eigen!2326

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 13:44:03 -07:00
Pavel Guzenfeld
a72172e563 Add blocking and vectorization boundary tests for LU and Cholesky
libeigen/eigen!2317
2026-03-20 13:27:49 -07:00
Pavel Guzenfeld
30128de0e3 Guard eigen_fill_helper on trivially copyable scalars
libeigen/eigen!2313

Closes #2956
2026-03-20 19:03:13 +00:00
Rasmus Munk Larsen
8a47aa334b Replace empirical product test tolerances with principled Higham-Mary bounds
libeigen/eigen!2292

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 11:03:58 -07:00
Pavel Guzenfeld
821ab7d3e6 Fix TensorUInt128 division infinite loop on overflow
libeigen/eigen!2300

Closes #3012

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-20 15:41:00 +00:00
Rasmus Munk Larsen
3578883bb3 CI: Split ASAN/UBSAN build into official/unsupported jobs
libeigen/eigen!2315

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-19 17:56:43 -07:00
Pavel Guzenfeld
3e5a2f9245 Fix vectorized erf returning NaN at ±inf instead of ±1
libeigen/eigen!2306

Closes #3053
2026-03-19 14:12:15 -07:00
Pavel Guzenfeld
36ca36d0de Guard redundant constexpr static member redeclarations for C++17+
libeigen/eigen!2299

Closes #3061

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-18 20:24:09 -07:00
Pavel Guzenfeld
62e23f79dd Fix GCC 13 array-bounds warning in TensorContraction
libeigen/eigen!2311

Closes #3017
2026-03-18 20:08:21 -07:00
Pavel Guzenfeld
05295a818b Fix undefined behavior in matrix_cwise test for signed integers
libeigen/eigen!2310

Closes #2933
2026-03-18 11:51:01 -07:00
Pavel Guzenfeld
0fd8002b11 Fix most vexing parse in SparseSparseProductWithPruning.h
libeigen/eigen!2298

Closes #3060

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-18 15:13:22 +00:00
Pavel Guzenfeld
c148dc8fad Include Scaling.h in IterativeSolvers module
libeigen/eigen!2309

Closes #3002
2026-03-17 21:59:57 -07:00
Rasmus Munk Larsen
1726a92900 CI: Reduce artifact size, cache clang-tidy, fix test retry, throttle QEMU
libeigen/eigen!2305

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-17 21:41:29 -07:00
Rasmus Munk Larsen
ea13a98dec Fix imag_ref for real scalar types and clean up svd_fill.h
libeigen/eigen!2303

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-15 19:56:01 -07:00
Antonio Sánchez
929785924c Fix more cache size queries.
libeigen/eigen!2296
2026-03-14 16:07:44 +00:00
Antonio Sánchez
b2f95d3733 Fix more cache size queries.
libeigen/eigen!2295
2026-03-14 15:43:24 +00:00
Antonio Sánchez
9ae0e0f195 Remove include from within Eigen namespace.
libeigen/eigen!2294
2026-03-13 21:03:24 +00:00
Rasmus Munk Larsen
c1faa74738 Add boundary test coverage: stableNorm, LinSpaced, complex GEMV, triangular solve
libeigen/eigen!2291

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 18:15:30 -07:00
Rasmus Munk Larsen
6b9275d1a8 Add test coverage for transpose, reverse, bool redux, select, diagonal-of-product at boundaries
libeigen/eigen!2290

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 17:02:58 -07:00
Rasmus Munk Larsen
356a9ba1da Add test coverage for matrix lpNorm, RowMajor partial reductions, selfadjoint boundaries
libeigen/eigen!2289

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 14:45:51 -07:00
Rasmus Munk Larsen
15cae83485 Add test coverage for strided maps, triangular blocking, and mixed storage orders
libeigen/eigen!2288

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 14:07:21 -07:00
Rasmus Munk Larsen
93aa959b8a Add vectorization boundary tests for redux and visitor
libeigen/eigen!2287

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 13:47:15 -07:00
Rasmus Munk Larsen
c93116b43d Improve test coverage for inner product, fill, reductions, and IO
libeigen/eigen!2286

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 12:48:45 -07:00
Rasmus Munk Larsen
5e478d3285 Improve product test coverage at critical code-path boundaries
libeigen/eigen!2285

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-12 12:32:06 -07:00
onalante-ebay
3a2ba7c434 Optimize predux_any<Packet4f>
libeigen/eigen!2277
2026-03-12 09:15:16 -07:00
Rasmus Munk Larsen
8190c82cb4 Add missing SIMD math function benchmarks
libeigen/eigen!2284

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-11 23:20:11 -07:00
Rasmus Munk Larsen
8368a12f0f Add runtime cache size detection for ARM and improve GEMM blocking
libeigen/eigen!2282

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-11 22:36:33 -07:00
Rasmus Munk Larsen
42c1dbd2c3 Add aarch64 smoke test pipeline for MRs
libeigen/eigen!2283

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-11 22:19:16 -07:00
Charles Schlosser
875fb48f0a fix various irksome compiler warnings
libeigen/eigen!2280
2026-03-11 21:01:20 -07:00
Charles Schlosser
2a2456c873 restore Eigen/src/Core/arch/Altivec/MatrixProduct.h to b1e74b1cc
libeigen/eigen!2279
2026-03-12 03:26:03 +00:00
Charles Schlosser
c4eb3c4f4c fix custom visitors
libeigen/eigen!2275

Closes #2920
2026-03-11 10:52:49 +00:00
Antonio Sánchez
4387e32481 Fix row-skipping bug in general_matrix_vector_product::run_small_cols
libeigen/eigen!2276
2026-03-10 15:16:00 -07:00
Juraj Oršulić
81550faea4 Use Web Archive for dead link for the PDF referenced in Geometry/EulerAngles.h
libeigen/eigen!2274
2026-03-09 20:18:43 -07:00
Rasmus Munk Larsen
42b6c43cfe Revert "Remove random retry loops in tests (batch 2: indices and integer types)"
This reverts merge request !2261
2026-03-09 20:01:53 -07:00
Rasmus Munk Larsen
54458cb39d Remove random retry loops in tests (batch 3: geometry, sparse, umeyama)
libeigen/eigen!2262

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-09 00:35:26 -07:00
Rasmus Munk Larsen
a3cb1c6591 cxx11_tensor_random: use retry loop for low-precision RNG collisions
libeigen/eigen!2269

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 16:19:48 -07:00
Rasmus Munk Larsen
f80d7b8254 Fix three more flaky tests: igamma, tensor_random, matrix_power
libeigen/eigen!2268

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 16:00:04 -07:00
Rasmus Munk Larsen
8eaa7552fe Fix three flaky tests: packetmath, array_cwise, polynomialsolver
libeigen/eigen!2267

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 14:59:23 -07:00
Rasmus Munk Larsen
dd81698aed Fix vectorization_logic test for wide SIMD widths
libeigen/eigen!2266

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 12:11:33 -07:00
Rasmus Munk Larsen
ab58784268 Remove random retry loops in tests (batch 5: geometry, mixing types, triangular)
libeigen/eigen!2264

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 11:51:35 -07:00
Rasmus Munk Larsen
411422f2dc Remove random retry loop in SVD min-norm test
libeigen/eigen!2263

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 11:20:27 -07:00
Rasmus Munk Larsen
7c3a344763 Remove random retry loops in tests (batch 2: indices and integer types)
libeigen/eigen!2261

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 11:02:45 -07:00
Rasmus Munk Larsen
be7538ed65 Remove random retry loops in tests (batch 1: simple scalar cases)
libeigen/eigen!2260

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 10:44:57 -07:00
Rasmus Munk Larsen
5790d716c3 Simplify and optimize pow/cbrt special case handling
libeigen/eigen!2259

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-08 10:19:51 -07:00
Rasmus Munk Larsen
3041ab44af Fix GEBP asm register constraints for custom scalar types
libeigen/eigen!2258

Closes #3059

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-07 07:49:27 -08:00
Antonio Sánchez
20fce70e5a Fix another complex div edge case.
libeigen/eigen!2257
2026-03-06 13:37:26 -08:00
Antonio Sánchez
5bacb5be9a Fix null pointer dereference in Sparse-Dense products for Sparse vectors.
libeigen/eigen!2256
2026-03-06 10:50:28 -08:00
Tyler Veness
d8c8ee6fb2 Fix crash on construction of SparseMatrix with zero-length diagonal
libeigen/eigen!2249
2026-03-06 01:43:21 +00:00
Rasmus Munk Larsen
265496e862 Fix heap overflow in BM_BatchContraction benchmark
libeigen/eigen!2251

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-04 21:01:34 -08:00
Rasmus Munk Larsen
eea4d31f58 Simplify and modernize XprHelper.h
libeigen/eigen!2243

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-04 10:33:05 -08:00
Rasmus Munk Larsen
dd826edb42 Replace typedef with using in tensor contraction files
libeigen/eigen!2247

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-04 08:59:22 -08:00
Antonio Sánchez
abc3d6014d Fix CUDA+Clang build warnings.
libeigen/eigen!2241
2026-03-04 01:41:01 -08:00
Rasmus Munk Larsen
0269c017aa Revise Tensor module README.md: fix bugs, add missing docs
libeigen/eigen!2240

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-03 23:44:49 -08:00
Charles Schlosser
ca94be70da fix uninitialized variable in constexpr function
libeigen/eigen!2236
2026-03-03 21:01:40 -08:00
Rasmus Munk Larsen
b0ebf966a5 Fix default rank-detection threshold in QR and LU decompositions
libeigen/eigen!2232

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-03 18:44:22 -08:00
Antonio Sánchez
d36a7db7b5 Fix Eigen::array constructors.
libeigen/eigen!2235
2026-03-03 22:15:47 +00:00
Antonio Sánchez
661cdb227f Fix relative paths after move.
libeigen/eigen!2234
2026-03-02 19:50:30 +00:00
Rasmus Munk Larsen
57b1de2330 Fix row-major GEMV dropping rows when n8 heuristic disables main loop
libeigen/eigen!2233

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-01 23:47:35 -08:00
Rasmus Munk Larsen
662d5c21ff Optimize SYMV, SYR, SYR2, and TRMV product kernels
libeigen/eigen!2228

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-01 19:40:11 -08:00
Rasmus Munk Larsen
c66fc52868 Add ULP accuracy measurement tool and documentation for vectorized math functions
libeigen/eigen!2153

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-01 13:22:16 -08:00
Rasmus Munk Larsen
c20b6f5c41 Restore EIGEN_EMPTY_STRUCT_CTOR as a no-op macro for backward compatibility
libeigen/eigen!2231

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-01 11:44:38 -08:00
Rasmus Munk Larsen
77d9173596 Add ca-certificates to clang-tidy CI job
libeigen/eigen!2230

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-28 09:12:13 -08:00
Rasmus Munk Larsen
444ae9761d Clamp igamma/igammac output to [0,1] for numerical stability
libeigen/eigen!2229

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-28 08:52:44 -08:00
Rasmus Munk Larsen
eddb470a09 Fix flaky array_cwise and sparse_basic tests
libeigen/eigen!2227

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-27 21:52:46 -08:00
Antonio Sánchez
25dba492e3 Use stack-constructed variable for SVD block sweep.
libeigen/eigen!2225
2026-02-28 05:04:41 +00:00
Rasmus Munk Larsen
f64d1e0acc Improve ConditionEstimator docs and tighten test bounds
libeigen/eigen!2226

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 22:08:13 -08:00
Rasmus Munk Larsen
8525491eb1 Add dedicated unit tests and benchmark for ConditionEstimator
libeigen/eigen!2223

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 18:26:38 -08:00
Antonio Sánchez
e730b1fe33 Fix mixed products GEMM.
libeigen/eigen!2224
2026-02-26 15:47:39 -08:00
Rasmus Munk Larsen
3adfa9bd37 Add const to non-mutating member functions across remaining modules
libeigen/eigen!2222

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 13:58:04 -08:00
Rasmus Munk Larsen
13b61529f4 Add const to non-mutating member functions in products/ and Serializer
libeigen/eigen!2221

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 12:35:44 -08:00
Rasmus Munk Larsen
aaca9e5856 Add missing const qualifiers in Eigen/src/Core/
libeigen/eigen!2220

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 11:23:53 -08:00
Rasmus Munk Larsen
1b1b7e347d Fix EIGEN_NO_AUTOMATIC_RESIZING not resizing empty destinations
libeigen/eigen!2219

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 07:54:27 -08:00
Rasmus Munk Larsen
064d686c57 Remove CXX11/ directory nesting for Tensor modules
libeigen/eigen!2199

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 07:03:38 -08:00
Rasmus Munk Larsen
11eb66e1b5 Remove pre-C++14 workarounds from unsupported/ tensor code
libeigen/eigen!2218

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 06:17:39 -08:00
Rasmus Munk Larsen
a95440de17 Remove obsolete bench/ and btl/ directories
libeigen/eigen!2217

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 20:19:45 -08:00
Rasmus Munk Larsen
6e2aff6b5d Fix ambiguous static_cast in JacobiSVD blocking threshold
libeigen/eigen!2215

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 19:48:20 -08:00
Rasmus Munk Larsen
d8ed4f6884 Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64
libeigen/eigen!2216

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 19:26:49 -08:00
Rasmus Munk Larsen
6b6d0d8c8e Revert "Fix ambiguous static_cast in JacobiSVD blocking threshold computation"
This reverts commit e567151ce3.
2026-02-25 19:08:21 -08:00
Rasmus Munk Larsen
ba2fc4e775 Revert "Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64"
This reverts commit 888d708dcd.
2026-02-25 19:08:21 -08:00
Rasmus Munk Larsen
888d708dcd Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64
On ARM64 (and LoongArch64), the GEBP kernel uses nr=8, so the RHS is
packed in 8-column blocks. The half-packet and quarter-packet row
processing loops were iterating columns 4 at a time starting from j2=0,
misindexing into the 8-column packed RHS buffer. This produced
completely wrong results for float GEMM when the number of rows was
smaller than the SIMD packet size (e.g. 2x10 * 10x8 float).

Add the missing nr>=8 column iteration blocks to both loops, matching
the pattern already present in the 3x, 2x, 1x, and scalar remainder
sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-25 19:03:11 -08:00
Rasmus Munk Larsen
e567151ce3 Fix ambiguous static_cast in JacobiSVD blocking threshold computation
The L2 cache size threshold computation used numext::sqrt with a
static_cast<RealScalar>, which fails to compile when RealScalar is
AnnoyingScalar (a test-only type with multiple conversion constructors).
Since this is a pure cache-size computation unrelated to the matrix
scalar type, use std::sqrt(double) instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-25 19:03:01 -08:00
Rasmus Munk Larsen
a31de4778d Blocked Jacobi SVD sweep with L2-cache-adaptive threshold
libeigen/eigen!2206

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2026-02-25 10:03:05 -08:00
Rasmus Munk Larsen
647e0009ba Refactor BDCSVD D&C code to reduce compilation time and memory footprint
libeigen/eigen!2211

Closes #3048

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 09:11:38 -08:00
Rasmus Munk Larsen
4fab38d798 Make clang generic vector backend support 16, 32, and 64-byte vectors
libeigen/eigen!2213

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 08:50:47 -08:00
Rasmus Munk Larsen
ea25ea52bb Revert accidental changes from !2212 squash merge
libeigen/eigen!2214

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 08:31:41 -08:00
Rasmus Munk Larsen
38f0f42755 Update rmlarsen email address from @google.com to @gmail.com
libeigen/eigen!2212

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 07:45:02 -08:00
Rasmus Munk Larsen
d0d70a9527 Consolidate complex math function boilerplate with shared macros
libeigen/eigen!2201

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 07:21:20 -08:00
Rasmus Munk Larsen
c4c704e5dd Install libclang-rt-19-dev for asan-ubsan CI job
libeigen/eigen!2210

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 20:19:42 -08:00
Rasmus Munk Larsen
61895c5978 Selectively add constexpr to Core expression template scaffolding
libeigen/eigen!2184

Closes #3041

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 19:59:10 -08:00
Rasmus Munk Larsen
34092d2788 Fix flaky tests: add iteration guards, yield in busy-waits, cap thread count
libeigen/eigen!2208

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 18:29:07 -08:00
Rasmus Munk Larsen
28d090a49c Refactor GenericPacketMathFunctions.h into smaller focused headers
libeigen/eigen!2200

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 17:46:12 -08:00
Rasmus Munk Larsen
16da0279f1 Add benchmarks for unsupported modules and extend supported benchmarks
libeigen/eigen!2179

Closes #3036

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 17:12:33 -08:00
Rasmus Munk Larsen
fa567f6bcd Add CUDA CI jobs with NVHPC (nvc++) as host and device compiler
libeigen/eigen!2204

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 16:54:08 -08:00
Antonio Sánchez
2cd9bb7380 Fix sparse product with entities that do not have direct access.
libeigen/eigen!2205
2026-02-24 16:27:06 -08:00
Rasmus Munk Larsen
00cc497d32 Add clang-tidy, codespell, and sanitizer checks to CI pipeline
libeigen/eigen!2178

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-23 19:43:45 -08:00
Rasmus Munk Larsen
241af1c0ba Add NVHPC (nvc++) compiler support and CI build/test jobs
libeigen/eigen!2186

Closes #3032

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-23 18:51:15 -08:00
Antonio Sánchez
f3f2c676b5 Fix direct access for sparse blocks.
libeigen/eigen!2202
2026-02-23 12:00:52 -08:00
Rasmus Munk Larsen
d537b51ede Fix ComplexEigenSolver NaN with flush-to-zero arithmetic
libeigen/eigen!2196

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-23 11:15:31 -08:00
Rasmus Munk Larsen
667cabe3aa Clean up comments in unsupported module
libeigen/eigen!2198

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 22:04:23 -08:00
Rasmus Munk Larsen
78b76986b7 Comment cleanup v3: trailing ??, informal language, FIXME/TODO colons
libeigen/eigen!2197

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 21:20:08 -08:00
Rasmus Munk Larsen
112c2324bd Consolidate BF16/F16 wrapper macros and simplify arch math functions
libeigen/eigen!2195

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 20:17:43 -08:00
Rasmus Munk Larsen
d5e67adbe7 Clean up informal language, vague TODOs, and dead code in comments
libeigen/eigen!2191

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 18:32:10 -08:00
Rasmus Munk Larsen
7d727d26bc Refactor GenericPacketMathFunctions.h into smaller focused headers
libeigen/eigen!2190

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 16:30:57 -08:00
Rasmus Munk Larsen
9810969c0f Suppress false-positive GCC and clang warnings in test builds
libeigen/eigen!2187

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 14:54:15 -08:00
Rasmus Munk Larsen
ad7f1fe70e Improve clang vector extension backend
libeigen/eigen!2183

Closes #3042

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 13:31:21 -08:00
Rasmus Munk Larsen
1f49bf96cf Add new benchmarks for Core, LU, and QR operations
libeigen/eigen!2177

Closes #3035

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 12:19:37 -08:00
Rasmus Munk Larsen
8c35441f18 Fix typos: misspellings, French variable names, and hyphenation
libeigen/eigen!2185

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 10:04:40 -08:00
Rasmus Munk Larsen
44c6132163 Fix ~40 typos found by codespell across the codebase
libeigen/eigen!2181

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 21:29:50 -08:00
Rasmus Munk Larsen
f52ad04bbb Fix ASAN-detected bugs in Diagonal::data() and array_cwise test
libeigen/eigen!2182

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 21:11:36 -08:00
Rasmus Munk Larsen
d4077a6e99 Reorganize benchmarks into subdirectories and clean up Eigen sources
libeigen/eigen!2176

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 17:46:55 -08:00
Rasmus Munk Larsen
832b940976 Update COPYING.README to clarify third-party license status
libeigen/eigen!2174

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 17:16:07 -08:00
Rasmus Munk Larsen
e6accc73ff Fix comment typos, doubled words, grammar errors, and copy-paste mistakes
libeigen/eigen!2173

Closes #3034

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 14:36:21 -08:00
Rasmus Munk Larsen
0e424f4050 Remove dead code, commented-out blocks, and outdated comments
libeigen/eigen!2172

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 12:49:56 -08:00
Rasmus Munk Larsen
18791a81b9 Fix MSVC build: disable [[msvc::forceinline]] on generic lambdas
libeigen/eigen!2171

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-21 00:13:58 -08:00
Rasmus Munk Larsen
95e8bc3267 Add EIGEN_LAMBDA_ALWAYS_INLINE macro for MSVC lambda inlining
libeigen/eigen!2170

Closes #3033

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-20 21:28:47 -08:00
Rasmus Munk Larsen
a87ecfb179 Use m_ prefix consistently for private/protected member variables
libeigen/eigen!2168

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-20 20:35:58 -08:00
Rasmus Munk Larsen
270ea539fa Remove redundant EIGEN_STRONG_INLINE from trivial constexpr and = default functions
libeigen/eigen!2161

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-20 19:50:52 -08:00
Antonio Sánchez
e0a8d6c9d8 Fix compile warnings
libeigen/eigen!2167
2026-02-20 23:09:56 +00:00
Rasmus Munk Larsen
1dcea43c49 Fix RowMajor performance for triangular/dense assignment
libeigen/eigen!2165

Closes #3031

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-20 08:02:15 -08:00
Rasmus Munk Larsen
374fe225bf Reduce GEMV and TRSM benchmark sizes for faster routine runs
libeigen/eigen!2163

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-20 00:56:57 -08:00
Rasmus Munk Larsen
2c898e8b95 Remove unused LhsPacketType typedef in gebp_peeled_loop
libeigen/eigen!2162

Closes #3029

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-19 20:15:15 -08:00
Rasmus Munk Larsen
4fdc82d695 Fix mixed-type compilation error in row-major GEMV small-cols path
libeigen/eigen!2160

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-19 19:54:16 -08:00
Rasmus Munk Larsen
4141d1fd2d Fix -Wtautological-overlap-compare warning in row-major GEMV dispatch
libeigen/eigen!2158

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-19 15:56:16 -08:00
Rasmus Munk Larsen
53e3408cb7 Optimize GEMV kernels: row-major small-cols and template deduplication
libeigen/eigen!2151

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-19 15:06:24 -08:00
Rasmus Munk Larsen
9c63d26dec Remove reference to nonexistent spmv.cpp in benchmarks
libeigen/eigen!2157

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-19 04:22:35 -08:00
Rasmus Munk Larsen
5f09b3b63f Fix missing template argument list in trsmKernelR for Clang 20/21
libeigen/eigen!2155

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 20:43:17 -08:00
Rasmus Munk Larsen
c9eab40878 Fix unused variable warning for phys_l1 on non-AVX512 builds
libeigen/eigen!2154

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 20:26:28 -08:00
Rasmus Munk Larsen
3c86a013b1 Vectorize generic trsmKernelR for non-AVX512 targets
libeigen/eigen!2135

Closes #3027

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 17:34:31 -08:00
Steve Bronder
43a01f06ad update AVX and AVX512 to support gcc < 10.1 and clang < 10
libeigen/eigen!2129

Closes #3021
2026-02-18 22:07:24 +00:00
Rasmus Munk Larsen
552ca8f15f Simplify GEBP micro-kernel and improve blocking heuristics
libeigen/eigen!2142

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 13:16:14 -08:00
Rasmus Munk Larsen
e953f1e504 Modernize C++14 usage and minor optimizations in Core
libeigen/eigen!2143

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2026-02-18 12:47:51 -08:00
Rasmus Munk Larsen
f69745b678 Fix real x complex GEMM for backends where half == full packet size
libeigen/eigen!2150

Closes #3028

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 12:32:24 -08:00
Rasmus Munk Larsen
073190be04 Fix outdated documentation across multiple .dox files
libeigen/eigen!2148

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 12:23:41 -08:00
Rasmus Munk Larsen
bdec88009d Remove const from return-by-value types (issue #1087)
libeigen/eigen!2144

Closes #1087

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-18 03:10:11 -08:00
Rasmus Munk Larsen
3108f6360e Migrate Eigen benchmarks to the Google benchmark framework
libeigen/eigen!2132

Closes #3025

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-17 20:51:36 -08:00
Rasmus Munk Larsen
740cac97b4 Fix AVX double-precision trig and complex exp without AVX2
libeigen/eigen!2147

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-17 19:48:16 -08:00
Rasmus Munk Larsen
50d6d92a70 Optimize sparse-dense product by bypassing InnerIterator for compressed storage
libeigen/eigen!2134

Closes #3026

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-17 19:19:18 -08:00
Rasmus Munk Larsen
b6b2f31ba8 Fix compiler warnings from GCC 13 and Clang 18
libeigen/eigen!2146

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-17 10:09:39 -08:00
Rasmus Munk Larsen
113207a9de Optimize JacobiSVD 2x2 kernel and hoist sweep threshold
libeigen/eigen!2139

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-16 15:39:38 -08:00
Rasmus Munk Larsen
e6e5b5c4c8 Fix pexp_complex for complex<double> (issue #3022)
libeigen/eigen!2140

Closes #3022

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-16 15:30:31 -08:00
Rasmus Munk Larsen
2b561f9284 Revert "Specialized enable_borrowed_ranges for VectorwiseOp class range iteration"
This reverts merge request !2127
2026-02-16 02:12:28 -08:00
Blake
d0654a201b Specialized enable_borrowed_ranges for VectorwiseOp class range iteration
libeigen/eigen!2127

Closes #2882
2026-02-15 07:31:33 -08:00
Antonio Sánchez
1a2b80727c Fix pdiv for complex packets involving infinites.
libeigen/eigen!2131
2026-02-14 17:47:32 -08:00
Blake
9b709e8269 Diagonalview example typo
libeigen/eigen!2130
2026-02-12 21:12:11 -08:00
Blake
23fcc1c6c9 MatrixBase::diagonalView issue 604
libeigen/eigen!2126

Closes #604
2026-02-10 02:12:03 +00:00
Antonio Sánchez
004d81a852 Fix cblat3/zblat3 test program with gfortran.
libeigen/eigen!2122
2026-02-09 17:50:57 -08:00
Chip Kerchner
0ac2a2df9f Prevent predux_half for DoublePacket from accidentally catching complex Packets of size >= 16
libeigen/eigen!2125
2026-02-08 10:19:45 -08:00
Antonio Sánchez
4d05fcf8da Fix packetmath tests on M* macs.
libeigen/eigen!2120
2026-02-08 10:07:24 -08:00
Blake
752911927f betainc edge case checks at start of calculation
libeigen/eigen!2123

Closes #2359
2026-02-08 10:05:06 -08:00
Antonio Sánchez
afb4380534 Fix RunQueue race condition on weak memory architectures (ARM64)
libeigen/eigen!2121
2026-02-06 02:27:08 +00:00
YJ Chang
c648296368 Update HVX floating-point reduction to support V79 architecture.
libeigen/eigen!2124
2026-02-04 16:39:51 +00:00
Sean Talts
ddfc68d399 Fix clang vector backend type compatibility issues.
libeigen/eigen!2116
2026-01-29 17:17:03 +00:00
mehmet alper kuyumcu
93ff388841 Fix relative tolerance scaling by multiplying with RHS norm in BiCGSTAB
libeigen/eigen!2118
2026-01-29 16:59:38 +00:00
Blake
26c242ab58 make EIGEN_BLAS macro names consistent and undef at end of file
libeigen/eigen!2119

Closes #2954
2026-01-29 16:50:27 +00:00
Charles Schlosser
3d6f5fe8fe Tests: skip denorms in ARM ieee tests
libeigen/eigen!2115
2026-01-27 17:43:19 +00:00
Charles Schlosser
fdfdd4c96b test suite: emit the function name when an ieee test fails
libeigen/eigen!2114
2026-01-22 02:32:38 +00:00
Charles Schlosser
e246f9cb68 Use memset if !NumTraits<Scalar>::RequireInitialization
libeigen/eigen!2113

Closes #3019
2026-01-22 01:01:26 +00:00
Antonio Sánchez
f46a2c561e Fix bad static access for TensorDeviceGpu.
libeigen/eigen!2111
2026-01-20 19:12:53 +00:00
Simon Merkle
c09151a2c2 Wrote resizing documentation page
libeigen/eigen!2110

Co-authored-by: Simon Merkle <st172506@stud.uni-stuttgart.de>
2026-01-19 08:29:52 -08:00
Charles Schlosser
f7772e3946 Gcc warnings
libeigen/eigen!2109
2026-01-18 23:41:20 +00:00
Antonio Sánchez
918a5f1a6f Fix warnings related to variable_if_dynamic.
libeigen/eigen!2107

Closes #2807
2026-01-16 19:47:14 +00:00
Antonio Sánchez
9a37aca9fc Fix assignment size assertion for EIGEN_NO_AUTOMATIC_RESIZING.
libeigen/eigen!2108

Closes #3018
2026-01-15 19:04:53 +00:00
Yu You
251bff2885 CUDA 13 compatibility update for unit test gpu_basic
libeigen/eigen!2106
2026-01-09 22:42:33 +00:00
Yu You
0315fb319a Change inline hint for general_matrix_vector_product<>::run() to gain performance
libeigen/eigen!2092
2026-01-09 19:46:37 +00:00
Chip Kerchner
7aea350ba1 Fix more packetmath issues for RVV
libeigen/eigen!2105
2026-01-09 12:16:28 -05:00
Chip Kerchner
5d9beb81ab Initial version of reactivating RVV features like GeneralBlockPanelKernel
libeigen/eigen!2096
2026-01-07 13:41:02 -05:00
Charles Schlosser
d90a0534be fix polynomialsolver test failures
libeigen/eigen!2104
2026-01-05 05:19:49 +00:00
Martin Diehl
711118b747 docs does not exists
libeigen/eigen!2103
2026-01-03 00:57:18 +00:00
Charles Schlosser
c30af8f3db fix UB in random implementation and tests
libeigen/eigen!2102
2025-12-31 03:57:04 +00:00
srpgilles
c5aa40675a Fix check_that_free_is_allowed so that it properly checks is_free_allowed and not is_malloc_allowed
libeigen/eigen!2101

Co-authored-by: Sébastien Gilles <sebastien.gilles@inria.fr>
2025-12-30 15:20:19 +00:00
Antonio Sánchez
5793499a55 Fix AVX512FP16 build.
libeigen/eigen!2100

Closes #3013
2025-12-29 18:31:48 +00:00
Charles Schlosser
2ac496ff8a Revert !1953 and !1954
libeigen/eigen!2099

Closes #3011
2025-12-28 21:28:42 +00:00
Antonio Sánchez
9164d3f16a Fix undefined behavior in packetmath.
libeigen/eigen!2098

Closes #3009
2025-12-18 21:08:52 +00:00
Cédric Hubert
748e0a6517 Add missing semicolon
libeigen/eigen!2097
2025-12-18 08:49:11 -05:00
Nicholas Vinson
fe973ab0c5 Force early evaluation of boost expressions.
libeigen/eigen!2094
2025-12-16 19:55:59 +00:00
Guilhem Saurel
976f15ebca fix doc generation with doxygen 1.14 & 1.15
libeigen/eigen!2095

Closes #2976
2025-12-16 19:54:18 +00:00
Chip Kerchner
4f14da11d9 Add basic support for packetmath for BF16 RVV
libeigen/eigen!2093
2025-12-16 14:25:46 -05:00
Chip Kerchner
21e4582d17 Merge remote-tracking branch 'refs/remotes/origin/master' 2025-12-15 15:35:58 +00:00
Yu You
a7209fad70 GemmKernel: Define static constexpr member variables out-of-class for C++14 compatibility
libeigen/eigen!2091
2025-12-14 01:00:12 +00:00
Chip Kerchner
cdc62b84c7 Merge remote-tracking branch 'origin2/master' 2025-12-12 16:15:56 +00:00
Chip Kerchner
26fe567dd2 Fix FP16 for RVV so that it will compile for gcc
libeigen/eigen!2090
2025-12-10 08:42:26 -05:00
Chip Kerchner
afbf8173dd Merge remote-tracking branch 'origin2/master' 2025-12-10 03:22:29 +00:00
Gregory Meyer
9b00db8cb9 Simplify thread-safe initialization of GpuDeviceProperties.
libeigen/eigen!2089
2025-12-09 18:36:45 +00:00
Chip Kerchner
8cdc0fa67d Fix naming of predux_half for RVV when LMUL > 1
libeigen/eigen!2087
2025-12-05 13:50:16 -05:00
Chip Kerchner
f610edadcc Merge remote-tracking branch 'origin2/master' 2025-12-05 17:19:56 +00:00
Rasmus Munk Larsen
75bcd155c4 Vectorize tan(x)
libeigen/eigen!2086

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-12-02 21:53:10 +00:00
Antonio Sánchez
01a919d13f Fix AOCL cmake issues.
libeigen/eigen!2084
2025-12-01 03:32:22 +00:00
Ulysses Apokin
a73501cc76 Added versioning for shared libraries.
libeigen/eigen!2080

Co-authored-by: Ulysses Apokin <ulysses@altlinux.org>
2025-11-27 22:18:42 +00:00
Rasmus Munk Larsen
db90c4939c Add a ptanh_float implementation that is accurate to 1 ULP
libeigen/eigen!2082

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-26 00:17:12 +00:00
Antonio Sánchez
48eb5227c8 Add BLAS function axpby.
libeigen/eigen!2083
2025-11-25 23:13:02 +00:00
Antonio Sánchez
a1eeb02204 Expand CMake compatibility range for single-version specifications.
libeigen/eigen!2081

Closes #3004
2025-11-25 02:42:03 +00:00
sharad bhaskar
8a1083e9bf Aocl integration updated
libeigen/eigen!1952
2025-11-24 17:20:42 +00:00
Chip Kerchner
5aefbab777 Merge remote-tracking branch 'origin2/master' 2025-11-21 12:56:47 +00:00
Rasmus Munk Larsen
a6630c53c1 Fix bug introduced in !2030
libeigen/eigen!2079

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-20 19:29:19 -05:00
Chip Kerchner
3ff3d03783 Merge remote-tracking branch 'origin2/master' 2025-11-20 17:43:17 +00:00
Chip Kerchner
49623d0c4e This patch adds support for RISCV's vector extension RVV1.0.
libeigen/eigen!2030
2025-11-20 16:28:07 +00:00
Chip Kerchner
196eed3d62 Merge branch 'master' of https://gitlab.com/libeigen/eigen 2025-11-20 15:45:06 +00:00
Rasmus Munk Larsen
8eb6551a8a Add support for complex numbers in the generic clang backend
libeigen/eigen!2078

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-20 00:26:37 +00:00
Chip Kerchner
1242d948dd Merge remote-tracking branch 'origin2/master' 2025-11-18 12:59:58 +00:00
Joseph Prince Mathew
8401a241cb Add summary to lldb pretty printing of Eigen::Matrix
libeigen/eigen!2016

Co-authored-by: Joseph Prince Mathew <jmathew@dbuoy.com>
2025-11-17 17:24:03 +00:00
Eric A. Borisch
e75c29fd9d EigenTesting.cmake: Quote argument to separate_arguments.
libeigen/eigen!2077

Closes #3005 and #2866
2025-11-17 17:22:44 +00:00
Chip Kerchner
38e2b94367 Merge remote-tracking branch 'origin' 2025-11-14 00:00:35 +00:00
Yu You
dcbaf2d608 Switch the inline hint to EIGEN_ALWAYS_INLINE for a few functions
libeigen/eigen!2076

Closes #2993
2025-11-13 03:58:53 +00:00
Rasmus Munk Larsen
a7674b70d3 Improve packet op test coverage for IEEE special values.
libeigen/eigen!2075

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-12 22:19:50 +00:00
Chip Kerchner
26dc851a72 Merge remote-tracking branch 'origin' 2025-11-12 15:08:17 +00:00
Charles Schlosser
72bfca3d82 cxx11_tensor_expr.cpp: delete extraneous semicolon
libeigen/eigen!2074
2025-11-11 01:39:38 +00:00
Rasmus Munk Larsen
9b511fe4fe Fix cxx11_tensor_expr.cpp 2025-11-10 19:11:35 +00:00
Chaofan Qiu
943fdc71c6 Use more FMA in reciprocal iteration for precision
libeigen/eigen!2073
2025-11-10 18:36:11 +00:00
Charles Schlosser
1133aa82c7 fix various compiler warnings
libeigen/eigen!2072
2025-11-10 17:14:35 +00:00
Charles Schlosser
8ae3b1aaa5 Fix loongarch unsigned pabsdiff
libeigen/eigen!2071
2025-11-09 19:19:43 +00:00
Rasmus Munk Larsen
035cf68498 Fix build of realview.cpp 2025-11-08 23:19:54 +00:00
Rasmus Munk Larsen
23a5482fc0 Misc. packet math cleanups.
libeigen/eigen!2070

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-08 21:57:20 +00:00
Antonio Sánchez
4cb0776f8e Add 5.0.1 release notes and a few unreleased features.
libeigen/eigen!2069
2025-11-08 20:51:44 +00:00
Charles Schlosser
8b85f5933a Fix realview
libeigen/eigen!2062
2025-11-08 13:36:43 +00:00
Rasmus Munk Larsen
ffcd7bdbd6 Avoid breaking the build on older compilers.
See merge request libeigen/eigen!2068

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-07 21:25:09 +00:00
Antonio Sánchez
da867c31c9 Fix defines in AVX512 custom TRSM kernel.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Broken at head by !2063.

See merge request libeigen/eigen!2066
2025-11-07 20:14:24 +00:00
Rasmus Munk Larsen
8a9bfb72d7 Rename preduce_half for HVX. 2025-11-07 16:52:07 +00:00
Chip Kerchner
332bfa95c4 Merge remote-tracking branch 'origin/master' 2025-11-07 14:51:14 +00:00
Antonio Sánchez
ed989c7504 Enable generic clang backend tests.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Enable generic clang backend tests.

Added an AVX512 job using the generic clang backend.

Also fixed up some guards in the custom AVX512 gemm/trsm kernels so they don't
start defining things if they aren't used.

See merge request libeigen/eigen!2063
2025-11-07 01:37:12 +00:00
Rasmus Munk Larsen
3368ac6c69 Don't set platform-specific vectorization macros for generic backend.
See merge request libeigen/eigen!2065

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-06 23:12:54 +00:00
Rasmus Munk Larsen
fecfa7f27e Fixes to make generic backend build with AVX512
See merge request libeigen/eigen!2064

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-06 22:50:32 +00:00
Rasmus Munk Larsen
ec93a6d098 Add a generic Eigen backend based on clang vector extensions
The goal of this MR is to implement a generic SIMD backend (packet ops) for Eigen that uses clang vector extensions instead of platform-dependent intrinsics. Ideally, this should make it possible to build Eigen and achieve reasonable speed on any platform that has a recent clang compiler, without having to write any inline assembly or intrinsics.

Caveats:

* The current implementation is a proof of concept and supports vectorization for float, double, int32_t, and int64_t using fixed-size 512-bit vectors (a somewhat arbitrary choice). I have not done much to tune this for speed yet.
* For now, there is no way to enable this other than setting -DEIGEN_VECTORIZE_GENERIC on the command line.
* This only compiles with newer versions of clang. I have tested that it compiles and all tests pass with clang 19.1.7.

https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors

Closes #2998 and #2997

See merge request libeigen/eigen!2051

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
Co-authored-by: Antonio Sánchez <cantonios@google.com>
2025-11-06 21:52:19 +00:00
Rasmus Munk Larsen
7c7d84735e Align temporary array in TensorSelectOp packet evaluator. 2025-11-05 19:44:47 +00:00
Antonio Sánchez
142caf889c Fix MKL enum conversion warning.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Fix MKL enum conversion warning.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->
Fixes #2999.

Closes #2999

See merge request libeigen/eigen!2061
2025-11-05 18:03:22 +00:00
Antonio Sánchez
9e5714b93b Remove deprecated CUDA device properties.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Remove deprecated CUDA device properties.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->
Fixes #3000.

Closes #3000

See merge request libeigen/eigen!2060
2025-11-05 17:12:33 +00:00
Tyler Veness
06f5cb4878 Use wrapper macro for multidimensional subscript feature test
See merge request libeigen/eigen!2059
2025-11-04 22:26:27 +00:00
Rasmus Munk Larsen
63fc0bc8c1 Make TernarySelectOp immune to const differences.
See merge request libeigen/eigen!2058

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-04 21:42:32 +00:00
Rasmus Munk Larsen
71703a9816 Make assume_aligned a no-op on ARM & ARM64 when msan is used, to work around a missing linker symbol. 2025-11-04 20:26:36 +00:00
Tyler Veness
f95b4698fc Add support for C++23 multidimensional subscript operator
I'm not sure where to put tests for this, assuming they're needed. They also wouldn't run in CI anyway since CI only exercises the C++17 codepaths.

See merge request libeigen/eigen!2053
2025-11-04 07:03:04 +00:00
Rasmus Munk Larsen
b6fcddccfc Get rid of pblend packet op.
There was only a single code path left in TensorEvaluator using pblend. We can replace that with a call to the more general TernarySelectOp and get rid of pblend entirely from Core.

Closes #2998

See merge request libeigen/eigen!2056

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-03 23:27:50 +00:00
Rasmus Munk Larsen
ed9a0e59ba Fix more bugs in !2052
Fixes #2998

Closes #2998

See merge request libeigen/eigen!2057

Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com>
2025-11-03 20:26:17 +00:00
Rasmus Munk Larsen
a20fc40e4e Revert "simplify squaredNorm"
This causes some subtle alignment-related bugs, which @chuckyschluz is currently investigating.

See merge request libeigen/eigen!2055
2025-11-03 18:59:51 +00:00
Antonio Sánchez
04eb06b354 Fix doc references for nullary expressions.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Fix doc references for nullary expressions.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

Fixes #2997.

### Additional information
<!--Any additional information you think is important.-->

Closes #2997

See merge request libeigen/eigen!2054
2025-11-03 18:47:04 +00:00
Rasmus Munk Larsen
bfdbc031c2 Fixes #2998. 2025-11-02 23:58:16 +00:00
Rasmus Munk Larsen
8716f109e4 Implement assume_aligned using the standard API
This implements `Eigen::internal::assume_aligned` to match the API for C++20 standard as best as possible using either `std::assume_aligned` or `__builtin_assume_aligned` if available. If neither is available, the function is a no-op.

The override macro `EIGEN_ASSUME_ALIGNED` was changed to a `EIGEN_DONT_ASSUME_ALIGNED`, which now forces the function to be a no-op.

See merge request libeigen/eigen!2052
2025-11-01 12:04:19 +00:00
Rasmus Munk Larsen
ce70a507c0 Enable more generic packet ops for double.
See merge request libeigen/eigen!2050
2025-10-30 19:14:43 +00:00
Charles Schlosser
fb5bb3e98f simplify squaredNorm
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->
My review of https://gitlab.com/libeigen/eigen/-/merge_requests/2048 reminded me there was a much easier and better way of doing this for complex arrays.


### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

### Additional information
<!--Any additional information you think is important.-->

See merge request libeigen/eigen!2049
2025-10-30 02:51:15 +00:00
Rasmus Munk Larsen
ece9a4c0b6 Always vectorize abs2() for non-complex types.
For several packet types, `abs2` was not vectorized even if it only requires `pmul`. Get rid of the confusing and redundant `HasAbs2` enum and instead check `HasMul` in addition to making sure the scalar type is not complex.

See merge request libeigen/eigen!2048
2025-10-30 02:42:56 +00:00
Antonio Sánchez
60122df698 Allow user to configure if free is allowed at runtime.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Allow user to configure if `free` is allowed at runtime.

Reverts to Eigen 3.4 behavior by default, where `free(...)` is allowed if `EIGEN_RUNTIME_NO_MALLOC` is defined but `set_is_malloc_allowed(true)`.  Adds a separate `set_is_free_allowed(...)` to explicitly control use of `std::free`.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

Fixes #2983.

### Additional information
<!--Any additional information you think is important.-->

Closes #2983

See merge request libeigen/eigen!2047
2025-10-28 22:29:00 +00:00
Chip Kerchner
add5e76204 Merge remote-tracking branch 'origin2/master' 2025-10-28 20:26:26 +00:00
Tyler Veness
9234883914 Fix SparseVector::insert(Index) assigning int to Scalar
Scalar doesn't necessarily support implicit construction from int or
assignment from int.

Here's the error message I got without this fix:
```
/home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/SparseCore/SparseVector.h:180:25: error: no match for ‘operator=’ (operand types are ‘Eigen::internal::CompressedStorage<ExplicitDouble, int>::Scalar’ {aka ‘ExplicitDouble’} and ‘int’)
  180 |     m_data.value(p + 1) = 0;
      |     ~~~~~~~~~~~~~~~~~~~~^~~
```

See merge request libeigen/eigen!2046
2025-10-27 22:06:59 +00:00
Tyler Veness
be56fff1ff Fix ambiguous sqrt() overload caused by ADL
Here's the compiler error:
```
/home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/Householder/Householder.h:82:16: error: call of overloaded ‘sqrt(boost::decimal::decimal64_t)’ is ambiguous
   82 |     beta = sqrt(numext::abs2(c0) + tailSqNorm);
      |            ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/Householder/Householder.h:82:16: note: there are 2 candidates
In file included from /home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/Core:198,
                 from /home/tav/git/Sleipnir/test/src/optimization/cart_pole_problem_test.cpp:8:
/home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/Core/MathFunctions.h:1384:75: note: candidate 1: ‘typename Eigen::internal::sqrt_retval<typename Eigen::internal::global_math_functions_filtering_base<Scalar>::type>::type Eigen::numext::sqrt(const Scalar&) [with Scalar = boost::decimal::decimal64_t; typename Eigen::internal::sqrt_retval<typename Eigen::internal::global_math_functions_filtering_base<Scalar>::type>::type = boost::decimal::decimal64_t; typename Eigen::internal::global_math_functions_filtering_base<Scalar>::type = boost::decimal::decimal64_t]’
 1384 | EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE EIGEN_MATHFUNC_RETVAL(sqrt, Scalar) sqrt(const Scalar& x) {
      |                                                                           ^~~~
In file included from /home/tav/git/Sleipnir/build/_deps/decimal-src/include/boost/decimal/detail/cmath/ellint_1.hpp:16,
                 from /home/tav/git/Sleipnir/build/_deps/decimal-src/include/boost/decimal/cmath.hpp:18,
                 from /home/tav/git/Sleipnir/build/_deps/decimal-src/include/boost/decimal.hpp:33,
                 from /home/tav/git/Sleipnir/test/include/scalar_types_under_test.hpp:6,
                 from /home/tav/git/Sleipnir/test/src/optimization/cart_pole_problem_test.cpp:19:
/home/tav/git/Sleipnir/build/_deps/decimal-src/include/boost/decimal/detail/cmath/sqrt.hpp:167:16: note: candidate 2: ‘constexpr T boost::decimal::sqrt(T) requires  is_decimal_floating_point_v<T> [with T = decimal64_t]’
  167 | constexpr auto sqrt(const T val) noexcept
      |                ^~~~
```

Calling a function via its unqualified name invokes argument-dependent lookup. In this case, since `using numext::sqrt;` was used, both `numext::sqrt()` and `boost::decimal::sqrt()` participated in overload resolution. Since only `numext::sqrt()` was intended, the fix is to call that overload directly instead.

See merge request libeigen/eigen!2044
2025-10-26 02:03:22 +00:00
Rasmus Munk Larsen
2e91853adf Fix a benign bug in ComplexQZ
ComplexQZ would try to apply a Jacobi rotation to an empty block, which triggers a warning in static analyzers, since the corresponding `Eigen::Map` object will contain a `nullptr`.

See merge request libeigen/eigen!2043
2025-10-24 20:54:54 +00:00
Antonio Sánchez
1a5eecd45e Clarify range spanning major versions only works with 3.4.1.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Clarify range spanning major versions only works with 3.4.1.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

Fixes #2994.

Closes #2994

See merge request libeigen/eigen!2042
2025-10-24 19:54:27 +00:00
Antonio Sánchez
b4209fe984 Eliminate use of std::cout in ArpackSelfAdjointEigenSolver.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Eliminate use of std::cout in ArpackSelfAdjointEigenSolver.

Instead set the appropriate error status on failure.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

### Additional information
<!--Any additional information you think is important.-->

See merge request libeigen/eigen!2041
2025-10-24 19:46:57 +00:00
Tyler Veness
ac3ef16f30 Fix SparseVector::insertBack() with custom scalar types
It fixes this compiler error:
```
/home/tav/git/Sleipnir/build/_deps/eigen3-src/Eigen/src/SparseCore/SparseVector.h:143:19: error: cannot convert ‘int’ to ‘const Eigen::internal::CompressedStorage<boost::decimal::decimal64_t, int>::Scalar&’ {aka ‘const boost::decimal::decimal64_t&’}
  143 |     m_data.append(0, i);
      |                   ^
      |                   |
      |                   int
```

This change matches what SparseMatrix does:
https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/SparseCore/SparseMatrix.h#L430-L438

See merge request libeigen/eigen!2040
2025-10-23 07:03:37 +00:00
Charles Schlosser
40da5b64ce CI enhancements: visual indication of flaky tests
<!-- 
Thanks for contributing a merge request! Please name and fully describe your MR as you would for a commit message.
If the MR fixes an issue, please include "Fixes #issue" in the commit message and the MR description.

In addition, we recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen) and [git page](https://eigen.tuxfamily.org/index.php?title=Git), which will help you submit a more standardized MR.

Before submitting the MR, you also need to complete the following checks:
- Make one PR per feature/bugfix (don't mix multiple changes into one PR). Avoid committing unrelated changes.
- Rebase before committing
- For code changes, run the test suite (at least the tests that are likely affected by the change).
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- If possible, add a test (both for bug-fixes as well as new features)
- Make sure new features are documented

Note that we are a team of volunteers; we appreciate your patience during the review process.

Again, thanks for contributing! -->

### Reference issue
<!-- You can link to a specific issue using the gitlab syntax #<issue number>  -->

### What does this implement/fix?
<!--Please explain your changes.-->

Currently, we run each test 3 times to account for flaky tests. Sometimes, the test fails so quickly that the random seed is the same for the subsequent test, which fails the exact same way. 

This MR uses a nanosecond seed which resolves the issue described above. Now, if the test does not pass on the first attempt but passes on the retries, the gitlab job status will be yellow but still be treated as a pass in the ci/cd pipeline. Hopefully, this means we will get more passes and help us identify room for improvement.

### Additional information
<!--Any additional information you think is important.-->

See merge request libeigen/eigen!2025
2025-10-22 04:51:51 +00:00
Antonio Sánchez
8e60d4173c Support AVX for i686.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

Support AVX for i686.

There was an existing work-around for windows.  Added the more generic
architecture comparison to also apply for linux.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

Fixes #2991.

Closes #2991

See merge request libeigen/eigen!2037
2025-10-20 21:47:42 +00:00
Antonio Sánchez
2aa2ff2900 More ComplexQZ fixes.
<!-- 
Thanks for contributing a merge request!

We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).

Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
  See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features.  If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
  This allows us to rebase and merge your change.

Note that we are a team of volunteers; we appreciate your patience during the review process.
-->

### Description
<!--Please explain your changes.-->

More ComplexQZ fixes.

Extra semicolons are triggering some warnings and errors with `-Werror`.
Moved the `Sparse` import up to the umbrella header to avoid IWYU exports.

### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>. 
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->

### Additional information
<!--Any additional information you think is important.-->

See merge request libeigen/eigen!2036
2025-10-20 21:09:53 +00:00
Chip Kerchner
f456635bfa Merge remote-tracking branch 'refs/remotes/origin/master' 2025-10-20 16:20:42 +00:00
Antonio Sánchez
f5907c5930 Fix commit references in changelog. 2025-10-18 23:48:30 +00:00
Antonio Sánchez
cc8748791c Include required headers with relative paths in ComplexQZ 2025-10-18 23:46:12 +00:00
Antonio Sanchez
73da4623b1 Try disabling the cache again for ROCm. 2025-10-17 15:17:33 -07:00
Antonio Sánchez
d739b9dc60 Set the merge request template to be default. 2025-10-17 22:16:57 +00:00
Antonio Sánchez
e77977e231 Set the merge request template to be default. 2025-10-17 22:06:46 +00:00
Chip Kerchner
745e019edc Merge remote-tracking branch 'refs/remotes/origin/master' 2025-10-17 15:29:54 +00:00
Ludwig Striet
4c8744774f Fixes #2987: delete unused variable steps 2025-10-16 13:30:48 +00:00
Charles Schlosser
d426838d01 fix complexqz Wpedantic warnings 2025-10-15 15:12:33 +00:00
Saran Tunyasuvunakool
db02f97850 Add a missing #include <version> to Core. 2025-10-15 15:10:30 +00:00
Rasmus Munk Larsen
da55dd1471 Cleanup: Move 2x2 real SVD into the Jacobi module where it naturally belongs. 2025-10-14 00:58:37 +00:00
Ludwig Striet
99f8512985 ComplexQZ 2025-10-13 17:35:03 +00:00
Antonio Sánchez
e1f1a608be Fix DLL builds and c++ lapack declarations. 2025-10-13 16:25:08 +00:00
Antonio Sánchez
3abaabb999 Streamline merge request and bug templates. 2025-10-12 22:19:03 +00:00
Antonio Sanchez
52358cb93b Grammar fix "must has" --> "must have". 2025-10-12 00:07:41 +00:00
Rasmus Munk Larsen
565db1c603 Optimize ApplyOnTheRight for Jacobi rotations when FMA is available 2025-10-12 00:04:19 +00:00
Antonio Sánchez
3bd0bfe0e0 Disable ROCm job cache. 2025-10-11 23:29:42 +00:00
Charlie Schlosser
cd4f989f8f assume_aligned uses bytes not bits 2025-10-10 14:08:10 -04:00
Antonio Sánchez
ac7c192e1b Add a bunch of useful scripts for planning releases. 2025-10-10 00:49:58 +00:00
Damiano Franzò
5bc944a3ef Fix jacobi svd for TriangularBase 2025-10-10 00:10:50 +00:00
Antonio Sánchez
dbe9e6961e Fix BLAS/LAPACK DLL usage on Windows. 2025-10-10 00:09:45 +00:00
Antonio Sánchez
ef3c5c1d1d Add workaround for using std::fma for scalar multiply-add. 2025-10-09 18:57:46 +00:00
Charles Schlosser
5996176b88 Fix alignment bug in avx pcast<Packet4l, Packet4d> 2025-10-09 02:50:42 +00:00
Laurenz
4bd382df56 Fix SSE PacketMath Compilation Error on QNX 2025-10-08 17:13:16 +00:00
Charles Schlosser
13bd14974d fix errors in windows builds and tests 2025-10-07 22:47:35 +00:00
Chip Kerchner
e9b178bfe2 Merge branch 'master' of https://gitlab.com/libeigen/eigen 2025-10-07 19:32:29 +00:00
Jeremy Nimmer
eea6587b0e Fix scalar_inner_product_op when binary ops return a different type 2025-10-05 22:51:50 +00:00
Rasmus Munk Larsen
7eaf9ae68d Add a method to SelfAdjointEigenSolver for computing the matrix exponential 2025-10-05 15:06:04 +00:00
Sergiu Deitsch
32b0f386bc Eliminate possible -Wstringop-overflow warning in .setZero() 2025-10-04 00:03:03 +02:00
Olav
1edf360e3c Fix line endings 2025-10-03 13:21:05 +02:00
Antonio Sánchez
ccde35bcd5 Update dev version number. 2025-10-01 22:58:44 +00:00
Guilhem Saurel
a67f9dabb0 tests: add missing link 2025-10-01 22:38:52 +00:00
Antonio Sánchez
e6792039fb Update changelog to reflect 3.4.1 and 5.0.0 releases. 2025-10-01 18:43:54 +00:00
Antonio Sánchez
4916887f2c Update geo_homogeneous test, add eval() to PermutationMatrix. 2025-10-01 18:01:11 +00:00
Eugene Zhulenev
5c1029be1a The 'CompressedStorageIterator<>' needs to satisfy the RandomAccessIterator 2025-09-30 16:28:41 +00:00
Charles Schlosser
f9f515fb55 get rid of a bunch of windows jobs 2025-09-30 01:44:48 +00:00
Hans Johnson
2e5447e620 STYLE: Scripts with shebang should be executable 2025-09-28 06:38:59 +00:00
Sergiu Deitsch
8d7ebac6ec Disambiguate multiplication of a permutation matrix and a homogeneous vector 2025-09-27 14:05:28 +02:00
Charles Schlosser
bea7f7c582 SparseMatrixBase: delete redundant/shadowed typedef 2025-09-26 09:32:28 +00:00
Julien Schueller
7292c78e18 blas: Fix parenthesis suggestion warning 2025-09-24 19:14:55 +00:00
Sergiu Deitsch
e524488eb2 Convert Mercurial hgeol to gitattributes 2025-09-24 19:14:40 +00:00
Charles Schlosser
dbd25f632b Fix select: return typed comparisons if vectorized 2025-09-24 05:38:12 +00:00
Antonio Sánchez
027dc5bc8d Extend the range of supported CMake package config versions 2025-09-23 19:52:35 +00:00
Sergiu Deitsch
4df215785b Support matrix multiplication of homogeneous row vectors 2025-09-23 14:56:28 +00:00
Rasmus Munk Larsen
2d170aea11 Define pcmp_le generically in terms of pcmp_eq and pcmp_lt. 2025-09-23 14:34:57 +00:00
Sergiu Deitsch
ea869e183b Add missing bool SSE2 PacketMath comparison 2025-09-22 21:28:45 +02:00
Julien Schueller
6ef18340a1 CMake: Explicit STATIC libs 2025-09-22 18:32:36 +00:00
Sergiu Deitsch
14477c5d43 Replace deprecated std::is_trivial by an internal definition 2025-09-22 16:59:10 +00:00
Antonio Sánchez
b2ec79a23c Move smoketests to small GitLab runners. 2025-09-22 16:45:02 +00:00
Sergiu Deitsch
62fbd276e0 Provide hints for deprecated functionality 2025-09-22 16:00:42 +00:00
Mark Shachkov
d38d669fdb Fix real schur exceptional shift 2025-09-22 15:57:14 +00:00
Sergiu Deitsch
4ac3e71f77 CMake: Require at least C++14 2025-09-22 15:45:39 +00:00
Antonio Sánchez
a627f72cd6 Add "Version" file and update version. 2025-09-20 02:08:59 +00:00
Evan Porter
e0a59e5a66 Fix typo 2025-09-09 06:21:42 +00:00
Evan Porter
6cd6284f7f Make the sparse matrix printer pretty 2025-09-08 20:05:46 +00:00
Antonio Sánchez
e5f3fa2d61 Add gemmtr implementation. 2025-09-05 22:31:30 +00:00
Antonio Sanchez
f426eff949 Add inline/device-function attributes to fma. 2025-09-02 22:51:35 +00:00
Antonio Sánchez
da1a34a6ba Zero-out matrix for empty set of triplets. 2025-09-02 22:51:17 +00:00
Evan Porter
52fc978c6f fixed typo sparcity -> sparsity 2025-09-02 19:34:43 +00:00
Antonio Sánchez
8a8fbc8f5e Don't enable AVX for wasm. 2025-08-29 21:50:25 +00:00
Antonio Sanchez
70d8d99d0d Only build docs on push to master branch, not MRs. 2025-08-29 18:33:09 +00:00
Antonio Sánchez
7f0cb638c5 Specialize numext::madd for half/bfloat16. 2025-08-29 18:11:25 +00:00
Antonio Sánchez
1e9d7ed7d3 Add missing semicolon to has_fma definitions to fix GPU builds. 2025-08-29 17:19:28 +00:00
Antonio Sanchez
5d4485e767 Move more jobs to gitlab runners. 2025-08-29 10:06:35 -07:00
Antonio Sánchez
2e8cc042a1 Replace calls to numext::fma with numext:madd. 2025-08-28 21:40:19 +00:00
Antonio Sánchez
52f570a409 Move GPU ci jobs to gitlab-hosted runners. 2025-08-28 18:24:41 +00:00
Charles Schlosser
38b51d5b7e Mitigate setConstant regression with custom scalars 2025-08-26 20:04:17 +00:00
Antonio Sanchez
d2a70fe4e2 Make permutation products aliasing by default. 2025-08-25 18:39:06 +00:00
Antonio Sánchez
4ae5647355 Fix direct index aliased assignment. 2025-08-25 18:17:18 +00:00
Antonio Sánchez
1a45d2168e Fix use of FMA in triangular solver for boost multiprecision. 2025-08-25 18:05:22 +00:00
anonymouspc
05e74b1a40 Tiny fix in unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h 2025-08-25 10:26:46 +00:00
Tyler Veness
d368998120 Fix MSVC error about missing std::bit_cast 2025-08-23 22:25:52 +00:00
Aleksei Nikiforov
c487a4fe9e Clean up most of testsuite on s390x 2025-08-15 20:04:25 +00:00
Charles Schlosser
4033cfcc1d Fix dangling reference in VectorwiseOp::iterator: Episode II: The Dependent Typedef Strikes Back 2025-08-14 16:30:19 +00:00
Charles Schlosser
e9dfbad618 Fix dangling reference in VectorwiseOp::iterator 2025-08-14 00:04:01 +00:00
Charles Schlosser
43a65a9cbd add RealView api 2025-08-12 16:55:05 +00:00
Rasmus Munk Larsen
954e21152e Include <limits> in test main.h 2025-08-10 21:23:31 +00:00
Artem Bishev
e15cd620a0 Remove select class 2025-08-10 17:44:09 +00:00
Cheng Wang
1c0048a08c Fix inconsistency between ptrue and pcmp_* in HVX 2025-08-09 19:32:30 +00:00
Artem Bishev
ddce1d7d12 Fixes #2952 2025-08-07 16:58:22 +00:00
Tyler Veness
8b9dbcdaaf Fix numext::bit_cast() compilation failure in C++20 2025-08-07 00:03:33 +00:00
Rasmus Munk Larsen
975a5aba4f Fix TODO: Use std::bit_cast or __builtin_bit_cast if available. 2025-08-06 19:00:08 +00:00
Rasmus Munk Larsen
4be7e6b4e0 Fix pcmp_* for HVX to comply with the new definition of true = Scalar(1) 2025-08-04 20:56:24 +00:00
Antonio Sánchez
edcf4c135f Remove fortran dependency for eigenblas. 2025-08-04 19:11:43 +00:00
Antonio Sanchez
e4493233e8 Fix EIGEN_OPTIMIZATION_BARRIER for clang-cl 2025-07-31 17:02:43 +00:00
Charles Schlosser
f5ead2d34c Fix intel packet math header inclusion order 2025-07-29 01:00:37 +00:00
Charles Schlosser
1e65707aa2 Suppress Warray-bounds warning in generic ploaduSegment, fix edge case for vectorized cast 2025-07-23 22:26:40 +00:00
Rasmus Munk Larsen
abeba85356 Use proper float literals in SpecialFunctionsImpl.h. 2025-07-19 01:17:12 +00:00
Rasmus Munk Larsen
b5bef9dcb0 Fix bug in Erfc introduced in !1862. 2025-07-18 17:58:48 -07:00
Rasmus Munk Larsen
97c7cc6200 Explicitly use the packet trait HasPow to control whether Pow is vectorized. 2025-07-18 21:51:42 +00:00
Rasmus Munk Larsen
efe5b6979d Unconditionally include <memory>. Some c++20 builds are currently broken because it is needed for std::assume_aligned. 2025-07-18 18:06:28 +00:00
Rasmus Munk Larsen
2cf66d4b0d Use numext::fma in more places in SparseCore. 2025-07-17 21:20:39 +00:00
jacques FRANC
d7fa5ebe0e Fix API incompatibility for ILU in superLU support 2025-07-17 15:27:26 +00:00
Kuan-Ting
cedf1f4c17 Fix typo: duplicated 'for' in docs 2025-07-16 01:12:48 +00:00
Charles Schlosser
302fc46bc3 arm packet alignment requirements and aligned loads/stores 2025-07-15 23:49:04 +00:00
Sean McBride
430e35fbd1 Fixed -Wshadow warning by renaming variables 2025-07-11 11:30:23 -04:00
Antonio Sánchez
bd0cd1d67b Fix self-adjoint products when multiplying by a compile-time vector. 2025-07-08 21:48:59 +00:00
Charles Schlosser
6854da2ea0 Fix 1x1 selfadjoint matrix-vector product bug 2025-07-07 17:32:54 +00:00
Sean McBride
ac1b29f823 Set CMake POLICY CMP0177 to NEW 2025-07-07 16:37:01 +00:00
Antonio Sánchez
849a336243 Move default builds/tests to GitLab runners. 2025-07-05 04:37:08 +00:00
Rasmus Munk Larsen
8ac2fb077d Use numext::fma for sparse x dense dot product. 2025-07-02 23:19:26 +00:00
Antonio Sánchez
cc0be00435 Fix docs build. 2025-07-02 22:10:33 +00:00
Antonio Sánchez
f169c13d8e Replace PPC g++-10 with g++14. 2025-07-02 17:07:44 +00:00
Henric Ryden
7fa069ef90 tensor documentation 2025-06-29 03:47:42 +00:00
Antonio Sánchez
7c636dd5db Move HIP/CUDA defines to Core. 2025-06-27 16:48:07 +00:00
Antonio Sánchez
26616fe5b8 Fix VSX packetmath psin and pcast tests. 2025-06-27 04:08:20 +00:00
Antonio Sánchez
a395ee162d Fix a collection of random failures encountered when testing with Bazel. 2025-06-26 16:58:24 +00:00
Antonio Sánchez
0bce653efc Use QEMU for arm and ppc tests. 2025-06-25 15:22:46 +00:00
Antonio Sánchez
db8bd5b825 Modify pselect and various masks to use Scalar(1) for true. 2025-06-20 22:40:46 +00:00
Antonio Sánchez
6de0515fa6 Create a changelog file. 2025-06-20 21:54:14 +00:00
Antonio Sánchez
98fbf6ed77 Decommission aarch64 ampere runner. 2025-06-20 20:33:52 +00:00
Charles Schlosser
81044ec13d Provide macro to explicitly disable alloca 2025-06-19 04:23:35 +00:00
Charles Schlosser
bcce88c99e Faster emulated half comparisons 2025-06-17 17:05:58 +00:00
Filippo Basso
ac6955ebc6 Remove MSVC warnings in FindCoeff.h 2025-06-17 00:39:02 +00:00
Antonio Sánchez
67a898a079 Fix unprotected SIZE in macro. 2025-06-16 22:54:25 +00:00
Antonio Sánchez
cdf6a1f5ed Add OpenBLAS sbgemm. 2025-06-16 18:23:03 +00:00
Charles Schlosser
d228bcdf8f Fix neon compilation bug 2025-06-10 21:52:01 +00:00
Charles Schlosser
994f3d107a Fix neon packet math tests, add missing neon intrinsics 2025-06-09 17:13:31 +00:00
AnonymousPC
cda19a6255 Make Eigen::Map<const Vector>::operator[] return correct type 2025-06-06 19:15:18 +00:00
Charles Schlosser
d0b490ee09 Optimize maxCoeff and friends 2025-06-06 14:55:49 +00:00
Antonio Sánchez
c458d68fae Fix compile warning about * with bool. 2025-06-05 22:48:57 +00:00
Adam Cogdell
3f00059beb Fix fuzzer range error for scalar parity check. 2025-06-05 22:27:35 +00:00
Charles Schlosser
21e89b930c Enable default behavior for pmin<PropagateFast>, predux_min, etc 2025-06-02 17:23:37 +00:00
Charles Schlosser
4fdf87bbf5 clean up intel packet reductions 2025-05-30 19:18:07 +00:00
Hs293Go
a7f183cadb Add factory/getters for quat coeffs in both orders 2025-05-28 18:39:55 -04:00
Sergiu Deitsch
d81aa18f4d Explicitly construct the scalar for non-implicitly convertible types 2025-05-15 17:40:29 +02:00
Charles Schlosser
171bd08ca9 fix 2849 2025-05-15 02:04:50 +00:00
Damiano Franzò
db85838ee2 Add DUCC FFT support 2025-05-12 17:56:02 +00:00
Damiano Franzò
6f1a143418 Ensure info() implementation across all SolverBase derived types 2025-05-10 01:25:26 +00:00
Damiano Franzò
f3e7d64f3d Fix: Correct Lapacke bindings for BDCSVD and JacobiSVD to match the updated API 2025-05-09 11:52:53 +00:00
Rasmus Munk Larsen
434a2fc4a4 Fix obsolete comment in InverseImpl.h. We use PartialPivLU for the general case. 2025-05-08 23:02:10 +00:00
Rasmus Munk Larsen
ae3aba99db Fix typo in CoreEvaluators.h 2025-05-08 17:43:12 +00:00
Charles Schlosser
ee4f86f909 Fix MSAN in vectorized casting evaluator 2025-05-08 09:38:35 +00:00
Duy Tran
6dbbf0a843 CMake: only create uninstall target when eigen is top level 2025-05-02 23:17:42 +00:00
Damiano Franzò
fb2fca90be Avoid unnecessary matrix copy in BDCSVD and JacobiSVD 2025-05-01 23:17:21 +00:00
Tyler Veness
d6b23a2256 Fix unused local typedef warning in matrix exponential 2025-04-29 19:54:15 +00:00
Rasmus Munk Larsen
7294434099 Avoid UB in ploaduSegment 2025-04-25 21:13:52 +00:00
Antonio Sánchez
2265a5e025 Fix commainitializer noexcept test. 2025-04-23 00:05:02 +00:00
Tyler Veness
619be0deb6 Replace instances of EIGEN_NOEXCEPT macros 2025-04-22 00:58:47 +00:00
Rasmus Munk Larsen
d2dce37767 Optimize slerp() as proposed by Gopinath Vasalamarri. 2025-04-21 14:11:42 -07:00
Rasmus Munk Larsen
66d8111ac1 Use a more conservative method to detect non-finite inputs to cbrt. 2025-04-21 20:59:46 +00:00
Tyler Veness
d6689a15d7 Replace instances of EIGEN_CONSTEXPR macro 2025-04-18 08:27:52 -07:00
Rasmus Munk Larsen
33f5f59614 Vectorize cbrt for float and double. 2025-04-17 23:31:20 +00:00
Charles Schlosser
5330960900 Enable packet segment in partial redux 2025-04-14 17:44:53 +00:00
Charles Schlosser
6266d430cc packet segment: also check DiagonalWrapper 2025-04-12 19:34:11 +00:00
Charles Schlosser
e39ad8badc fix constexpr in CoreEvaluators.h 2025-04-12 18:54:09 +00:00
Charles Schlosser
7aefb9f4d9 fix memset optimization for std::complex types 2025-04-12 16:20:09 +00:00
Charles Schlosser
73ca849a68 fix packetSegment for ArrayWrapper / MatrixWrapper 2025-04-12 12:12:48 +00:00
Charles Schlosser
28c3b26d53 masked load/store framework 2025-04-12 00:31:10 +00:00
Eugene Zhulenev
cebe09110c Fix a potential deadlock because of Eigen thread pool 2025-04-11 23:43:14 +00:00
William Kong
11fd34cc1c Fix the typing of the Tasks in ForkJoin.h 2025-04-09 17:21:36 +00:00
Hunter Belanger
2cd47d743e Fixe Conversion Warning in Parallelizer 2025-04-08 07:39:01 +00:00
Antonio Sánchez
b860042263 Add postream for ostream-ing packets more reliably. 2025-04-01 22:12:00 +00:00
Antonio Sánchez
02d9e1138a Add missing pmadd for Packet16bf. 2025-03-31 04:17:17 +00:00
Antonio Sánchez
9cc9209b9b Fix cmake warning and default to j0. 2025-03-29 16:09:40 +00:00
Rasmus Munk Larsen
e0c99a8dd6 By default, run ctests on all available cores in parallel. 2025-03-28 04:28:10 +00:00
Rasmus Munk Larsen
63a40ffb95 Use fma<float> for fma<half> and fma<bfloat16> if native fma is not available on the platform. 2025-03-28 04:26:04 +00:00
Antonio Sanchez
44fb6422be All triggering full CI if MR label containts all-tests 2025-03-27 08:37:24 -07:00
Rasmus Munk Larsen
3866cbfbe8 Fix test for TensorRef of trace. 2025-03-25 23:01:46 +00:00
Antonio Sanchez
6579e36eb4 Allow Tensor trace to be passed to a TensorRef. 2025-03-25 08:26:23 -07:00
Antonio Sanchez
8e32cbf7da Reduce flakiness of test for Eigen::half. 2025-03-23 22:31:25 -07:00
Antonio Sánchez
d935916ac6 Add numext::fma and missing pmadd implementations. 2025-03-23 01:05:53 +00:00
Charles Schlosser
754bd24f5e fix 2828 2025-03-22 17:19:44 +00:00
Charles Schlosser
ac2165c11f fix allFinite 2025-03-20 16:04:46 +00:00
William Kong
3143968195 Generalize the Eigen ForkJoin scheduler to use any ThreadPool interface. 2025-03-19 19:56:21 +00:00
Antonio Sánchez
70f2aead9a Use native _Float16 for AVX512FP16 and update vectorization. 2025-03-19 19:55:26 +00:00
Markus Vieth
0259a52b0e Use more .noalias() 2025-03-17 19:41:00 +01:00
Antonio Sánchez
14f845a1a8 Fix givens rotation. 2025-03-14 17:15:57 +00:00
Guilhem Saurel
33b04fe518 CMake: add install-doc target 2025-03-14 00:35:00 +00:00
Charles Schlosser
10e62ccd22 Fix x86 complex vectorized fma 2025-03-12 17:06:32 +00:00
Rasmus Munk Larsen
464c1d0978 Format TensorDeviceThreadPool.h & use if constexpr for c++20. 2025-03-08 01:09:36 +00:00
Rasmus Munk Larsen
21223f6bb6 Fix addition of different enum types. 2025-03-07 22:18:00 +00:00
Rasmus Munk Larsen
350544eb01 Clean up TensorDeviceThreadPool.h 2025-03-07 18:14:17 +00:00
Kevin
43810fc1be Fix extra semicolon in DeviceWrapper 2025-03-07 01:07:23 +00:00
Charles Schlosser
d28041ed5a refactor AssignmentFunctors.h, unify with existing scalar_op 2025-03-06 01:28:39 +00:00
Gopinath Vasalamarri
9a86214039 Optimize division operations in TensorVolumePatch.h 2025-02-28 22:34:13 +00:00
Antonio Sánchez
be5147b090 Fix STL feature detection for c++20. 2025-02-28 19:52:37 +00:00
Antonio Sanchez
179a49684a Fix CMake BOOST warning 2025-02-28 07:33:26 -08:00
Antonio Sanchez
dd56367554 Fix docs job for nightlies 2025-02-26 16:01:33 +00:00
Antonio Sánchez
d79bac0d3c Fix boolean scatter and random generation for tensors. 2025-02-25 21:37:09 +00:00
Tyler Veness
9935396b15 Specify constructor template arguments for ConstexprTest struct 2025-02-25 19:38:47 +00:00
Rasmus Munk Larsen
72adf891d5 Slightly simplify ForkJoin code, and make sure the test is actually run. 2025-02-25 17:22:43 +00:00
Antonio Sanchez
6aebfa9acc Build docs on push, and don't expire 2025-02-24 08:29:21 -08:00
Markus Vieth
bddaa99e15 Fix bitwise operation error when compiling as C++26 2025-02-23 02:30:55 +00:00
C. Antonio Sanchez
e42dceb3a1 Fix implicit copy-constructor warning in TensorRef. 2025-02-22 08:37:56 -08:00
Antonio Sanchez
5fc6fc9881 Initialize matrix in bicgstab test 2025-02-21 10:27:29 -08:00
Tyler Veness
0ae7b59018 Make assignment constexpr 2025-02-21 18:16:46 +00:00
Charles Schlosser
4dda5b927a fix Warray-bounds in inner product 2025-02-20 22:40:55 +00:00
C. Antonio Sanchez
66f7f51b7e Disable fno-check-new on clang. 2025-02-18 21:24:47 -08:00
Charles Schlosser
151f6127df Fix Warray-bounds warning for fixed-size assignments 2025-02-18 19:23:14 +00:00
C. Antonio Sanchez
1d8b82b074 Fix power builds for no VSX and no POWER8. 2025-02-15 13:56:47 -08:00
Charles Schlosser
eb3f9f443d refactor AssignmentEvaluator 2025-02-15 00:39:41 +00:00
Antonio Sánchez
9c211430b5 Fix TensorRef details 2025-02-14 18:33:26 +00:00
Antonio Sanchez
22cd7307dd Remove assumption of std::complex for complex scalar types. 2025-02-12 15:44:32 -08:00
Antonio Sánchez
6b4881ad48 Eliminate type-punning UB in Eigen::half. 2025-02-12 21:12:33 +00:00
Antonio Sánchez
420d891de7 Add missing mathjax/latex configuration. 2025-02-12 21:11:50 +00:00
Antonio Sánchez
becefd59e2 Returns condition number of zero if matrix is not invertible. 2025-02-12 07:09:20 +00:00
Antonio Sánchez
809d266b49 Fix numerical issues with BiCGSTAB. 2025-02-11 19:41:59 +00:00
Antonio Sánchez
ef475f2770 Add missing graphviz to doc build. 2025-02-11 16:03:41 +00:00
Antonio Sánchez
a0591cbc93 Fix doxygen-generated pages 2025-02-11 01:20:27 +00:00
Antonio Sánchez
715deac188 Add EIGEN_CI_CTEST_ARGS to allow for custom timeout. 2025-02-06 21:32:38 +00:00
Antonio Sánchez
4c38131a16 Fix android hardware_destructive_inference_size issue. 2025-02-05 23:53:55 +00:00
Antonio Sánchez
4c2611d27c Update check for std::hardware_destructive_interference_size 2025-02-05 19:41:07 +00:00
Antonio Sánchez
c079ee5e44 Fix tensor documentation. 2025-02-05 17:36:00 +00:00
Antonio Sanchez
74264c391a Add missing return statements for ppc. 2025-02-05 08:12:27 -08:00
Antonio Sánchez
3ebe898b5f Build and deploy nightly docs. 2025-02-05 00:35:34 +00:00
Antonio Sánchez
b73bb766a5 Increase max alignment to 256. 2025-02-04 20:06:28 +00:00
Antonio Sánchez
b1e74b1ccd Fix all the doxygen warnings. 2025-02-01 00:00:31 +00:00
Antonio Sánchez
9589cc4e7f Fix loongarch64 emulated tests. 2025-01-31 19:30:42 +00:00
Johannes Zipfel
2926b2e0a9 added functions to fetch L and U Factors from IncompleteLUT 2025-01-31 18:32:38 +00:00
William Kong
b6849f675d Change the midpoint chosen in Eigen::ForkJoinScheduler. 2025-01-30 20:21:30 +00:00
William Kong
1b2e84e55a Fix minor typos in ForkJoin.h 2025-01-29 20:12:04 +00:00
Tyler Veness
872c179f58 Fix UTF-8 encoding errors introduced by #1801 2025-01-28 16:52:46 -08:00
Rasmus Munk Larsen
2a35a917be Fix syntax error in NonBlockingThreadPool.h 2025-01-28 18:34:31 +00:00
Charles Schlosser
a056b93114 improve Simplicial Cholesky analyzePattern 2025-01-28 17:53:43 +00:00
William Kong
5d866a7a78 Fix potential data race on spin_count_ NonBlockingThreadPool member variable 2025-01-28 17:22:15 +00:00
William Kong
bc67025ba7 Clean up and fix the documentation of ForkJoin.h 2025-01-27 23:12:17 +00:00
Antonio Sánchez
dc1126e762 Fix threadpool for c++14. 2025-01-27 21:57:23 +00:00
Rasmus Munk Larsen
cd511a09aa Fix initialization order and remove unused variables in NonBlockingThreadPool.h. 2025-01-27 19:35:49 +00:00
Johannes Zipfel
f679843dc2 Block doc non square 2025-01-25 17:14:21 +00:00
William Kong
f9705adabb Fix typo introduced in the refactor of NonBlockingThreadPool 2025-01-25 17:13:24 +00:00
Antonio Sánchez
b75895a8b6 Try to fix loongarch 2025-01-25 16:38:41 +00:00
William Kong
4a6ac97d13 Add a ForkJoin-based ParallelFor algorithm to the ThreadPool module 2025-01-24 22:12:05 +00:00
Pengzhou0810
e986838464 Add LoongArch64 architecture LSX support.(build/test ) 2025-01-20 18:37:44 +00:00
Markus Vieth
c486af5ad3 Change Eigen::aligned_allocator to not inherit from std::allocator 2025-01-20 16:04:43 +00:00
Antonio Sánchez
abac563f5d Update documentation to clarify cross product for complex numbers. 2025-01-16 00:52:40 +00:00
Antonio Sanchez
2e76277bd0 Zero-initialize test arrays to avoid uninitialized reads. 2025-01-14 09:15:43 -08:00
Antonio Sánchez
ad13df7ea4 Fix std::fill_n reference. 2025-01-14 00:43:00 +00:00
Frédéric Simonis
9836e8d035 Fix read of uninitialized threshold in SparseQR 2025-01-08 23:40:58 +00:00
Charles Schlosser
7bb23b1e36 CI: don't add ToolChain PPA 2024-12-31 14:04:01 +00:00
xsjk
7bb8c58e7c Fix the missing CUDA device qualifier 2024-12-28 15:17:55 +00:00
Joerg Buchwald
24e0c2a125 use omp_get_max_threads if setNbThreads is not set 2024-12-20 21:16:15 +00:00
Jordan Rupprecht
a32db43966 Add missing #include <new> 2024-12-19 11:06:08 +00:00
Charles Schlosser
c01ff45312 Enable fill_n and memset optimizations for construction and assignment 2024-12-14 14:25:04 +00:00
Antonio Sánchez
af59ada0ac Use alpine for deploying nightly tag. 2024-12-10 22:48:29 +00:00
Charles Schlosser
4a9e32ae0b matrix equality operator 2024-12-10 12:40:39 +00:00
Antonio Sanchez
00776d1ba4 Remove branch name from nightly tag job. 2024-12-09 20:18:18 -08:00
Antonio Sanchez
7f23778593 Add tag to commit instead of branch 2024-12-09 07:47:48 -08:00
Antonio Sánchez
c30b35a310 Force tag to update to latest head. 2024-12-08 04:48:21 +00:00
Antonio Sánchez
a26ba67349 Add LICENSE file in correct place so it is picked up by gitlab. 2024-12-08 03:26:43 +00:00
Charles Schlosser
08c31c3ba6 try alpine for formatting 2024-12-08 01:09:33 +00:00
Antonio Sanchez
1ac1af62ef Update deploy job 2024-12-07 09:19:21 -08:00
Antonio Sánchez
7b6623af30 Fix special packetmath erfc flushing for ARM32. 2024-12-07 01:42:30 +00:00
Antonio Sánchez
fd48fbb260 Update rocm docker again again. 2024-12-06 22:13:53 +00:00
Antonio Sánchez
a885340ba5 Update rocm docker again. 2024-12-06 17:19:31 +00:00
Antonio Sanchez
45a8478d09 Update rocm docker image in CI. 2024-12-06 07:14:59 -08:00
Antonio Sánchez
de4afcf414 Add a deploy phase to the CI that tags the latest nightly pipeline if it passes. 2024-12-05 15:28:18 +00:00
Charles Schlosser
5e8916050b move constructor / move assignment doc strings 2024-12-04 17:42:20 +00:00
Charles Schlosser
77a073aaa8 fix checkformat ci stage 2024-12-04 02:45:52 +00:00
Charles Schlosser
41e46ed243 fix IOFormat alignment 2024-12-04 01:13:48 +00:00
Charles Schlosser
a0d32e40d9 fix map fill logic 2024-11-30 13:39:02 +00:00
Charles Schlosser
d34b100c13 Fix UB in setZero 2024-11-27 19:32:14 +00:00
Rasmus Munk Larsen
f19a6803c8 Refactor special case handling in pow(x,y) and revert to repeated squaring for <float,int> 2024-11-27 00:24:21 +00:00
Rasmus Munk Larsen
5064cb7d5e Add test for using pcast on scalars. 2024-11-25 22:27:26 -08:00
Rasmus Munk Larsen
1ea61a5d26 Improve pow(x,y): 25% speedup, increase accuracy for integer exponents. 2024-11-26 06:13:48 +00:00
Charles Schlosser
8ad4344ca7 optimize setConstant, setZero 2024-11-22 03:39:19 +00:00
Rasmus Munk Larsen
5610a13b77 Simplify and speed up pow() by 5-6% 2024-11-20 12:45:00 +00:00
Rasmus Munk Larsen
6c6ce9d06b Enable vectorized erf<double>(x) for SSE and AVX, which was accidentally removed in merge request 1750. 2024-11-19 22:14:29 +00:00
Rasmus Munk Larsen
e7c799b7c9 Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup. 2024-11-19 13:08:18 -08:00
Rasmus Munk Larsen
00af47102d Revert 040180078d 2024-11-19 10:25:16 -08:00
Rasmus Munk Larsen
8ee6f8475a Speed up exp(x). 2024-11-19 17:50:34 +00:00
Charles Schlosser
93ec5450cb disable fill_n optimization for msvc 2024-11-19 01:38:48 +00:00
Rasmus Munk Larsen
0af6ab4b76 Remove unnecessary check for HasBlend trait. 2024-11-18 21:16:45 +00:00
Rasmus Munk Larsen
d5eec781b7 Get rid of redundant computation for large arguments to erf(x). 2024-11-18 10:51:58 -08:00
Tyler Veness
2fc63808e4 Fix C++20 constexpr test compilation failures 2024-11-18 01:56:55 +00:00
Rasmus Munk Larsen
5133c836c0 Vectorize erf(x) for double. 2024-11-16 19:05:16 +00:00
Conrad Poelman
d6e3b528b2 Update Assign_MKL.h to cast disparate enum type to int, so it can be compared... 2024-11-15 20:00:29 +00:00
breathe1
040180078d Ensure that destructor's needed by lldb make it into binary in non-inlined fashion 2024-11-15 17:15:09 +00:00
Tyler Veness
0fb2ed140d Make element accessors constexpr 2024-11-14 01:05:29 +00:00
Charles Schlosser
8b4efc8ed8 check_size_for_overflow: use numeric limits instead of c99 macro 2024-11-13 00:35:35 +00:00
Charles Schlosser
489dbbc651 make fixed_size matrices conform to std::is_standard_layout 2024-11-12 23:34:26 +00:00
Rasmus Munk Larsen
283d871a3f Add missing EIGEN_DEVICE_FUNCTION decorations. 2024-11-08 14:25:57 -08:00
Rasmus Munk Larsen
0d366f6532 Vectorize erfc(x) for double and improve erfc(x) for float. 2024-11-08 17:21:11 +00:00
Charles Schlosser
8adf43640e more avx predux_any 2024-11-07 19:58:48 +00:00
Charles Schlosser
bc424f617a add missing avx predux_any functions 2024-11-07 19:11:29 +00:00
Charles Schlosser
e52ac76ca3 use EIGEN_CPLUSPLUS instead of checking cpp version 2024-11-06 17:25:22 +00:00
Rasmus Munk Larsen
122be167cd Revert "make fixed-size objects trivially move assignable" 2024-11-06 01:09:38 +00:00
Tobias Wood
d49021212b Tensor Roll / Circular Shift / Rotate 2024-11-05 14:10:19 +00:00
Charles Schlosser
bb73be8a2e make fixed-size objects trivially move assignable 2024-11-04 17:55:27 +00:00
Antonio Sánchez
7fd305ecae Fix GPU builds. 2024-11-01 04:50:03 +00:00
Morris Hafner
c8267654f2 Don't use __builtin_alloca_with_align with nvc++ 2024-10-30 18:02:08 +00:00
Tyler Veness
84c446df2c Fix macro redefinition warning in FFTW test 2024-10-30 17:18:42 +00:00
Antonio Sánchez
a9584d8e3c Fix clang6 failures. 2024-10-30 14:41:50 +00:00
Antonio Sánchez
dd4c2805d9 Fix clang6 failures. 2024-10-29 22:18:30 +00:00
Antonio Sánchez
9e962d9c54 Fix OOB access in triangular matrix multiplication. 2024-10-29 19:07:07 +00:00
Antonio Sánchez
695e49d1bd Fix NVCC builds for CUDA 10+. 2024-10-29 18:38:14 +00:00
Antonio Sánchez
dae09773fc Don't pass matrices by value. 2024-10-29 18:19:02 +00:00
Rasmus Munk Larsen
c23ec3420e Add tests for sizeof() with one dynamic dimension. 2024-10-28 13:48:53 -07:00
Rasmus Munk Larsen
58b252e5b3 Fix typo in PacketMath.h 2024-10-28 18:19:52 +00:00
Rasmus Munk Larsen
6c04d0cd68 Add missing exp2 definition for Altivec. 2024-10-28 18:12:36 +00:00
Peter Gavin
b15ebb1c2d add nextafter for bfloat16 2024-10-26 00:08:25 +00:00
Rasmus Munk Larsen
53b83cddf9 Include <type_traits> in main.h for std::is_trivial* 2024-10-25 20:55:51 +00:00
Charles Schlosser
37563856c9 Fix stack allocation assert 2024-10-25 17:02:43 +00:00
Rasmus Munk Larsen
3f067c4850 Add exp2() as a packet op and array method. 2024-10-22 22:09:34 +00:00
Charles Schlosser
4e5136d239 make fixed size matrices and arrays trivially_default_constructible 2024-10-21 17:10:15 +00:00
Antonio Sánchez
b396a6fbb2 Add free-function swap. 2024-10-14 15:51:40 +00:00
Charles Schlosser
820e8a45fb add compile time info to reverse in place 2024-10-13 17:55:56 +00:00
Charles Schlosser
b55dab7f21 Fix DenseBase::tail for Dynamic template argument 2024-10-12 21:03:30 +00:00
Charles Schlosser
e0cbc55d92 Update README.md 2024-10-10 01:54:30 +00:00
Rasmus Munk Larsen
7eea0a9213 Vectorize erfc() for float 2024-10-09 18:38:05 +00:00
Rasmus Munk Larsen
78f3c654ee Don't use constexpr with half. 2024-10-08 16:44:40 +00:00
Antonio Sánchez
6d7af238fa Adjust array_cwise for 32-bit arm. 2024-10-07 23:15:24 +00:00
Rasmus Munk Larsen
74dcfbbd0f Use ppolevl for polynomial evaluation in more places. 2024-10-07 13:27:28 -07:00
Rasmus Munk Larsen
a097f728fe Avoid producing erf(x) = NaN for large |x|. 2024-10-04 12:15:23 -07:00
Rasmus Munk Larsen
44b16f48cb Improve speed and accuracy or erf() 2024-10-03 01:52:16 +00:00
Antonio Sánchez
12068cbcdb Fix inverse evaluator for running on CUDA device. 2024-10-01 20:59:54 +00:00
Rasmus Munk Larsen
4e8e5e7409 Add max_digits10 in NumTraits for mpreal types. 2024-10-01 18:57:17 +00:00
Rasmus Munk Larsen
8e8c319087 Add missing EIGEN_DEVICE_FUNC annotations. 2024-10-01 11:40:58 -07:00
Charles Schlosser
7ad7c1d5c5 fix implicit conversion warning (again) 2024-09-24 22:07:00 +00:00
Charles Schlosser
d052b7f864 add extra debugging info to float_pow_test_impl, clean up array_cwise tests 2024-09-24 21:08:22 +00:00
Charles Schlosser
ba5183f98c fix warning in EigenSolver::pseudoEigenvalueMatrix() 2024-09-24 17:23:58 +00:00
Charles Schlosser
3ffb4e50df fix implicit conversion in TensorChipping 2024-09-24 16:58:49 +00:00
Sean McBride
b6b8b54e5e Fixed issue #2858: removed unneeded call to _mm_setzero_si128 2024-09-24 16:29:45 +00:00
Frédéric BRIOL
2a3465102a Refactor code to use constexpr for data() functions. 2024-09-23 16:43:53 +00:00
Charles Schlosser
2d4c9b400c make fixed size matrices and arrays trivially_copy_constructible and trivially_move_constructible 2024-09-17 17:43:36 +00:00
Antonio Sánchez
132f281f50 Fix generic ceil for SSE2. 2024-09-14 01:31:21 +00:00
Charles Schlosser
84282c42fc optimize new dot product 2024-09-11 21:40:43 +00:00
Charles Schlosser
fb477b8be1 Better dot products 2024-09-10 21:02:31 +00:00
Sophie Chang
134b526d61 Update NonBlockingThreadPool.h plain asserts to use eigen_plain_assert 2024-09-10 00:18:27 +00:00
qile lin
072ec9d954 Fix a bug for pcmp_lt_or_nan and Add sqrt support for SVE 2024-09-04 21:45:39 +00:00
Rasmus Munk Larsen
9315389795 Fix bug in bug fix for atanh. 2024-09-04 09:37:59 -07:00
Rasmus Munk Larsen
f33af052e0 Fix bug for atanh(-1). 2024-09-03 20:54:01 +00:00
Rasmus Munk Larsen
66927f7807 Fix out-of-range arguments to _mm_permute_pd. 2024-08-30 17:31:52 +00:00
Rasmus Munk Larsen
bbdabebf44 Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1. 2024-08-30 17:27:55 +00:00
Morris Hafner
26e2c4f617 Add nvc++ support 2024-08-30 12:34:48 +00:00
Eugene Zhulenev
c59332d74a Detect "effectively inner/outer" chipping in TensorChipping 2024-08-29 17:49:59 +00:00
Charles Schlosser
648bce6cae SSE/AVX Complex FMA 2024-08-29 17:37:57 +00:00
Charles Schlosser
c21a80be3d BDCSVD: Suppress Wmaybe-uninitialized 2024-08-29 02:45:38 +00:00
Charles Schlosser
9d3d37c5b7 Complex Numtraits::HasSign and nmsub test 2024-08-28 03:02:47 +00:00
Valentin Sarthou
c5189ac656 Fix GeneralizedEigenSolver::eigenvectors() not appearing in documentation 2024-08-24 00:30:06 +00:00
qile lin
3b5a1b4157 sve instrinsics with "_x" suffix will be faster than "_z" suffix 2024-08-23 12:52:22 +00:00
Rasmus Munk Larsen
98f1ac5e65 Fix breakage in GPU build. 2024-08-23 06:08:37 +00:00
Charles Schlosser
231308f690 TensorVolumePatchOp: Suppress Wmaybe-uninitialized caused by unreachable code 2024-08-23 01:55:12 +00:00
Tobias Wood
2bf8fe1489 NEON Complex Intrinsics 2024-08-22 22:46:16 +00:00
Rasmus Munk Larsen
f91f8e9ab9 Consolidate float and double implementations of patan(). 2024-08-21 20:44:18 +00:00
Charles Schlosser
87239e058a vectorize squaredNorm() for complex types 2024-08-21 10:54:17 +00:00
Rasmus Munk Larsen
32d95bb097 Add vectorized implementation of tanh<double> 2024-08-21 02:29:45 +00:00
Rasmus Munk Larsen
cc240eea2f Speed up and improve accuracy of tanh. 2024-08-16 23:46:28 +00:00
Rasmus Munk Larsen
92e373e6f5 Speed up StableNorm for non-trivial sizes and improve consistency between aligned and unaligned inputs. 2024-08-14 21:42:04 +00:00
Rasmus Munk Larsen
1dbc7581ec Include <thread> for std::this_thread::yield(). 2024-08-14 17:44:14 +00:00
Rasmus Munk Larsen
ab310943d6 Add a yield instruction in the two spinloops of the threaded matmul implementation. 2024-08-09 10:48:24 -07:00
Rasmus Munk Larsen
99ffad1971 A few cleanups to threaded product code and test. 2024-08-09 09:35:23 -07:00
Charles Schlosser
59498c96fe SSE/AVX use fmaddsub for complex products 2024-08-05 21:26:05 +00:00
Rasmus Munk Larsen
1dcae7cefc Revert "BDCSVD fix -Wmaybe-uninitialized"
This reverts merge request !1649
2024-08-05 18:17:01 +00:00
Tyler Veness
d14b0a4e53 Remove C++23 check around has_denorm deprecation suppression 2024-08-03 21:34:27 +00:00
Jatin Chaudhary
24db460503 hlog symbol lookup should not restricted to global namespace 2024-08-03 03:59:13 +00:00
Alexander Grund
767e60e290 Fix Woverflow warnings in PacketMathFP16 2024-08-03 03:57:18 +00:00
Alexander Grund
8025683226 Fix conversion of Eigen::half to _Float16 in AVX512 code 2024-08-03 03:49:51 +00:00
Alexey Korepanov
ec18dd09c8 fix pi in kissfft 2024-08-02 22:57:47 +00:00
Rasmus Munk Larsen
2b7b7aac57 Speed up complex * complex matrix multiplication. 2024-08-02 20:40:53 +00:00
Devon Loehr
b3e3b7b0ec Remove implicit this capture in lambdas 2024-08-02 20:06:35 +00:00
Eugene Zhulenev
e44db21092 Optimize ThreadPool spinning 2024-08-02 19:18:34 +00:00
Mike Taves
c593e9e948 Fix typos 2024-08-02 00:06:24 +00:00
Eugene Zhulenev
fd98cc49f1 Avoid atomic false sharing in RunQueue 2024-08-01 17:41:16 +00:00
Charles Schlosser
0b646f3f36 Update file .clang-format 2024-08-01 03:18:50 +00:00
Charles Schlosser
1dcb07bb2a Update file eigen_navtree_hacks.js 2024-08-01 02:51:04 +00:00
Charles Schlosser
ddb163ffb1 Update file .clang-format 2024-08-01 00:29:36 +00:00
Charles Schlosser
3f06651fd6 BDCSVD fix -Wmaybe-uninitialized 2024-07-30 22:53:06 +00:00
Frédéric Chapoton
6331da95eb fixing a lot of typos 2024-07-30 22:15:49 +00:00
Alexander Hans
c29c800126 Fix formatting in README.md 2024-07-03 19:16:56 +00:00
adambanas
33d0937c6b Add async support for 'chip' and 'extract_volume_patches' 2024-06-27 09:56:06 +02:00
Rasmus Munk Larsen
d791d48859 Fix AVX512FP16 build failure 2024-06-18 22:34:32 +00:00
Charles Schlosser
2fae4d7a77 Revert "fix scalar pselect" 2024-06-15 20:02:28 +00:00
Charles Schlosser
b430eb31e2 AVX512F double->int64_t cast 2024-06-15 17:45:02 +00:00
Charles Schlosser
02bcf9b591 fix scalar pselect 2024-06-10 17:30:22 +00:00
Louis David
392b95bdf1 allow pointer_based_stl_iterator to conform to the contiguous_iterator concept if we are in c++20 2024-06-06 21:38:09 +00:00
Victor Ceballos
27f8176254 fixing warning C5054: operator '==': deprecated between enumerations of different types 2024-06-04 16:44:13 +03:00
Charles Schlosser
eac6355df2 Fix warnings created by other warnings fix 2024-06-01 03:37:04 +00:00
Rasmus Munk Larsen
7029a2e971 Vectorize allFinite() 2024-06-01 03:24:26 +00:00
Charles Schlosser
e605227030 Fix warnings 2024-05-31 14:33:37 +00:00
Rasmus Munk Larsen
38b9cc263b Fix warnings about repeated deinitions of macros. 2024-05-29 13:38:00 -07:00
Rasmus Munk Larsen
f02f89bf2c Don't redefine EIGEN_DEFAULT_IO_FORMAT in main.h. 2024-05-29 18:14:32 +00:00
Rasmus Munk Larsen
9148c47d67 Vectorize isfinite and isinf. 2024-05-29 00:20:12 +00:00
Tobias Wood
5a9f66fb35 Fix Thread tests 2024-05-24 16:50:14 +00:00
Tyler Veness
c4d84dfddc Fix compilation failures on constexpr matrices with GCC 14 2024-05-22 12:29:01 +00:00
Charles Schlosser
99adca8b34 Incorporate Threadpool in Eigen Core 2024-05-20 23:42:51 +00:00
Tyler Veness
d165c7377f Format EIGEN_STATIC_ASSERT() as a statement macro 2024-05-20 23:02:42 +00:00
Charles Schlosser
f78dfe36b0 use built in alloca with align if available 2024-05-19 19:32:49 +00:00
Tyler Veness
b9b1c8661e Suppress C++23 deprecation warnings for std::has_denorm and std::has_denorm_loss 2024-05-17 15:55:22 +00:00
Charlie Schlosser
3d2e738f29 fix performance-no-int-to-ptr 2024-05-16 23:25:42 -04:00
Antonio Sánchez
de8013fa67 Fix ubsan failure in array_for_matrix 2024-05-16 18:47:36 +00:00
Rasmus Munk Larsen
5e4f3475b5 Remove call to deprecated method initParallel() in SparseDenseProduct.h 2024-05-15 23:12:32 +00:00
Charles Schlosser
59cf0df1d6 SparseMatrix::insert add checks for valid indices 2024-05-15 16:14:32 +00:00
Anabasis
c0fe6ce223 Fixed a clerical error at documentation of class Matrix. 2024-05-13 02:51:40 +00:00
Antonio Sánchez
afb17288cb Fix gcc6 compile error. 2024-05-10 19:13:21 +00:00
Chip Kerchner
4d1d14e069 Change predux on PowerPC for Packet4i to NOT saturate the sum of the elements (like other architectures). 2024-05-08 22:39:27 +00:00
daizhirui
ff174f7926 fix issue: cmake package does not set include path correctly 2024-05-07 21:21:08 +00:00
Antonio Sánchez
e16d70bd4e Fix FFT when destination does not have unit stride. 2024-05-07 17:18:29 +00:00
Charles Schlosser
99c18bce6e Msvc muluh 2024-05-07 16:30:58 +00:00
Charles Schlosser
8e47971789 Bit shifting functions 2024-05-03 18:55:02 +00:00
Antonio Sánchez
9700fc847a Reorganize CMake and minimize configuration for non-top-level builds. 2024-05-01 17:42:53 +00:00
Antonio Sánchez
c1d637433e Judge unitary-ness relative to scaling. 2024-04-30 22:28:46 +00:00
Rasmus Munk Larsen
9000b37677 Fix new generic nearest integer ops on GPU. 2024-04-30 22:18:25 +00:00
Charles Schlosser
0ee5c90aa9 Eigen transpose product 2024-04-30 13:32:52 +00:00
Charles Schlosser
fb95e90f7f Add truncation op 2024-04-29 23:45:49 +00:00
Jonathan Freed
d5524fc57b Remove unnecessary semicolons. 2024-04-29 21:31:26 +00:00
Antonio Sánchez
ae5280aa8d Fix more hard-coded magic bounds. 2024-04-29 21:21:11 +00:00
Antonio Sánchez
a5e147305b Fix undefined behavior for generating inputs to the predux_mul test. 2024-04-29 20:32:09 +00:00
Antonio Sánchez
dcceb9afec Unbork avx512 preduce_mul on MSVC. 2024-04-26 15:28:03 +00:00
Antonio Sánchez
42aa3d17cd Slightly adjust error bound for nonlinear tests. 2024-04-25 18:04:49 +00:00
Antonio Sanchez
1c8c734c8b Fix sin/cos on PPC. 2024-04-24 15:58:03 -07:00
Charles Schlosser
34967b0b5b Revert "fix transposed matrix product bug"
This reverts merge request !1598
2024-04-23 14:07:11 +00:00
Antonio Sánchez
9cec679ef1 Don't let the PPC runner try to cross-compile. 2024-04-23 03:40:40 +00:00
Charles Schlosser
574bc8820d fix transposed matrix product bug 2024-04-23 03:25:57 +00:00
Rasmus Munk Larsen
112ad8b846 Revert part of !1583, which may cause underflow on ARM. 2024-04-22 21:14:38 +00:00
Charles Schlosser
8cafbc4736 Fix unused variable warnings in TensorIO 2024-04-22 18:14:54 +00:00
Charles Schlosser
4de870b6eb fix autodiff enum comparison warnings 2024-04-22 18:14:20 +00:00
Antonio Sánchez
2265242aa1 Update CI scripts. 2024-04-20 01:08:19 +00:00
ahmed
ee9d57347b Fix tridiagonalization_inplace_selector::run() when called from CUDA 2024-04-19 21:06:59 +00:00
Charles Schlosser
1550c99541 Eigen select 2024-04-19 17:52:34 +00:00
Charles Schlosser
5635d37f46 more pblend optimizations 2024-04-19 02:02:27 +00:00
Antonio Sánchez
f0795d35e3 Fix new psincos for ppc and arm32. 2024-04-19 00:31:09 +00:00
Chip Kerchner
ad452e575d Fix compilation problems with PacketI on PowerPC. 2024-04-18 14:55:15 +00:00
Charles Schlosser
fcaf03ef7c fix pendantic compiler warnings 2024-04-17 16:55:45 +00:00
Rasmus Munk Larsen
b5feca5d03 Fix build for pblend and psin_double, pcos_double when AVX but not AVX2 is supported. 2024-04-16 16:12:41 +00:00
Damiano Franzò
888fca0e2b Simd sincos double 2024-04-15 21:12:32 +00:00
Charles Schlosser
6ad2ccea4e Eigen pblend 2024-04-15 16:19:53 +00:00
Charles Schlosser
9099c5eac7 Handle missing AVX512 intrinsic 2024-04-14 16:41:23 +00:00
Charles Schlosser
122befe54c Fix "unary minus operator applied to unsigned type, result still unsigned" on MSVC and other stupid warnings 2024-04-12 19:35:04 +00:00
Antonio Sánchez
dcdb0233c1 Refactor indexed view to appease MSVC 14.16. 2024-04-12 17:05:20 +00:00
Rasmus Munk Larsen
5226566a14 Speed up pldexp_generic. 2024-04-12 01:32:17 +00:00
Stéphane T.
3c6521ed90 Add constexpr to accessors in DenseBase, Quaternions and Translations 2024-04-11 14:46:48 +00:00
Rasmus Munk Larsen
3c9109238f Add support for Packet8l to AVX512. 2024-04-09 22:58:44 +00:00
Charles Schlosser
2620cb930b Update file Geometry_SIMD.h 2024-04-05 18:30:39 +00:00
Dieter Dobbelaere
b2c9ba2bee Fix preprocessor condition on when to use fast float logistic implementation. 2024-04-03 21:51:39 +00:00
Rasmus Munk Larsen
283d69294b Guard AVX2 implementation of psignbit in PacketMath.h 2024-04-03 21:03:26 +00:00
Chip Kerchner
be54cc8ded Fix preverse for PowerPC. 2024-04-03 20:09:06 +00:00
Rasmus Munk Larsen
c5b234196a Fix unused variable warning in TensorIO.h 2024-04-03 19:49:31 +00:00
Charles Schlosser
86aee3d9c5 Fix long double random 2024-04-02 12:05:40 +00:00
Charles Schlosser
776d86d8df AVX: guard Packet4l definition 2024-04-01 00:31:46 +00:00
Charles Schlosser
e63d9f6ccb Fix random again 2024-03-29 21:49:27 +00:00
Charles Schlosser
f75e2297db AVX2 - double->int64_t casting 2024-03-29 21:35:09 +00:00
Antonio Sánchez
13092b5d04 Fix usages of Eigen::array to be compatible with std::array. 2024-03-29 15:51:15 +00:00
Antonio Sánchez
77833f9320 Allow symbols to be used in compile-time expressions. 2024-03-28 18:43:50 +00:00
Antonio Sánchez
d26e19714f Add missing cwiseSquare, tests for cwise matrix ops. 2024-03-28 04:26:55 +00:00
Maarten Baert
35bf6c8edc Add SimplicialNonHermitianLLT and SimplicialNonHermitianLDLT 2024-03-28 00:22:27 +00:00
Rasmus Munk Larsen
4dccaa587e Use truncation rather than rounding when casting Packet2d to Packet2l. 2024-03-27 21:39:58 +00:00
Charles Schlosser
7b5d32b7c9 Sparse move 2024-03-27 17:44:50 +00:00
Antonio Sánchez
c8d368bdaf More fixes for 32-bit. 2024-03-26 22:53:38 +00:00
Antonio Sanchez
de304ab960 Fix using ScalarPrinter redefinition for gcc. 2024-03-26 15:32:38 -07:00
Rasmus Munk Larsen
c54303848a Undef macro in TensorContractionGpu.h that causes buildbreakages. 2024-03-26 18:01:48 +00:00
Antonio Sánchez
d8aa4d6ba5 Fix another instance of Packet2l on win32. 2024-03-26 15:48:44 +00:00
Antonio Sánchez
9f77ce4f19 Add custom formatting of complex numbers for Numpy/Native. 2024-03-25 17:41:44 +00:00
Charles Schlosser
5570a27869 cross3_product vectorization 2024-03-25 00:06:33 +00:00
Antonio Sánchez
0b3df4a6e6 Remove "extern C" in CholmodSupport. 2024-03-25 00:03:28 +00:00
Antonio Sanchez
a39ade4ccf Protect use of alloca. 2024-03-23 16:35:37 +00:00
Rasmus Munk Larsen
b86641a4c2 Add support for casting between double and int64_t for SSE and AVX2. 2024-03-22 22:32:29 +00:00
Antonio Sánchez
d883932586 Fix Packet*l for 32-bit builds. 2024-03-22 17:16:42 +00:00
Tyler Veness
d792f13a61 Make more Matrix functions constexpr 2024-03-19 22:02:21 +00:00
Rasmus Munk Larsen
d3cd312652 Remove slow index check in Tensor::resize from release mode. 2024-03-18 23:43:25 +00:00
Antonio Sánchez
386e2079e4 Fix Jacobi module doc. 2024-03-17 23:08:04 +00:00
Antonio Sánchez
8b101ade2b Fix CwiseUnaryView for MSVC. 2024-03-17 16:28:17 +00:00
Antonio Sánchez
0951ad2a8e Don't hide rbegin/rend for GPU. 2024-03-14 21:11:43 +00:00
Antonio Sánchez
24f8fdeb46 Fix CwiseUnaryView const access (Attempt 2). 2024-03-14 21:04:49 +00:00
Antonio Sánchez
285da30ec3 Fix const input and c++20 compatibility in unary view. 2024-03-13 16:59:44 +00:00
Rasmus Munk Larsen
126ba1a166 Add Packet2l for SSE. 2024-03-11 19:54:55 +00:00
Antonio Sánchez
1d4369c2ff Fix CwiseUnaryView. 2024-03-11 19:08:30 +00:00
Antonio Sánchez
352ede96e4 Fix incomplete cholesky. 2024-03-08 19:18:10 +00:00
Antonio Sánchez
f1adb0ccc2 Split up cxx11_tensor_gpu to reduce timeouts. 2024-03-07 17:21:37 +00:00
Antonio Sánchez
17f3bf8985 Fix pexp test for ARM. 2024-03-07 00:19:57 +00:00
Antonio Sanchez
6da34d9d9e Allow aligned assignment in TRMV. 2024-03-06 23:53:01 +00:00
Antonio Sánchez
3e8e63eb46 Fix packetmath plog test on Windows. 2024-03-06 23:51:47 +00:00
Tyler Veness
5ffb307afa Fix deprecated anonymous enum-enum conversion warnings 2024-03-06 21:22:02 +00:00
Antonio Sánchez
55dd487478 Revert "fix unaligned access in trmv"
This reverts merge request !1536
2024-03-06 16:42:59 +00:00
Antonio Sánchez
38fcedaf8e Fix pexp complex test edge-cases. 2024-03-04 17:44:38 +00:00
Sotiris Papatheodorou
251ec42087 Return 0 volume for empty AlignedBox 2024-03-04 17:32:44 +00:00
Antonio Sanchez
64edfbed04 Fix static_assert for c++14. 2024-03-02 20:39:34 -08:00
Charles Schlosser
3f3144f538 fix unaligned access in trmv 2024-03-03 04:20:09 +00:00
Antonio Sánchez
23f6c26857 Rip out make_coherent, add CoherentPadOp. 2024-02-29 23:15:02 +00:00
Antonio Sánchez
edaf9e16bc Fix triangular matrix-vector multiply uninitialized warning. 2024-02-29 21:00:58 +00:00
Antonio Sánchez
98620b58c3 Eliminate FindCUDA cmake warning. 2024-02-29 20:49:41 +00:00
Antonio Sánchez
cc941d69a5 Update error about c++14 requirement. 2024-02-29 20:45:13 +00:00
Antonio Sánchez
6893287c99 Add degenerate checks before calling BLAS routines. 2024-02-29 18:56:36 +00:00
Antonio Sánchez
fa201f1bb3 Fix QR colpivoting warnings and test failure. 2024-02-28 15:00:13 +00:00
Charles Schlosser
b334910700 delete shadowed typedefs 2024-02-28 02:40:45 +00:00
Antonio Sánchez
a962a27594 Fix MSVC GPU build. 2024-02-27 23:26:06 +00:00
Rasmus Munk Larsen
a2f8eba026 Speed up sparse x dense dot product. 2024-02-24 19:13:33 +00:00
Antonio Sánchez
7a88cdd6ad Fix signed integer UB in random. 2024-02-24 13:16:23 +00:00
Rasmus Munk Larsen
a6dc930d16 Speed up SparseQR. 2024-02-22 16:49:10 -08:00
Antonio Sánchez
feaafda30a Change array_size result from enum to constexpr. 2024-02-22 22:52:25 +00:00
Antonio Sánchez
8a73c6490f Remove "using namespace Eigen" from blas/common.h. 2024-02-22 22:51:42 +00:00
Rasmus Munk Larsen
6ed4d80cc8 Fix crash in IncompleteCholesky when the input has zeros on the diagonal. 2024-02-22 22:22:21 +00:00
Rasmus Munk Larsen
3859e8d5b2 Add method signDeterminant() to QR and related decompositions. 2024-02-20 23:44:28 +00:00
Rasmus Munk Larsen
db6b9db33b Make header guards in GeneralMatrixMatrix.h and Parallelizer.h consistent:... 2024-02-20 20:03:18 +00:00
Antonio Sánchez
b56e30841c Enable direct access for IndexedView. 2024-02-20 18:21:45 +00:00
Antonio Sanchez
90087b990a Fix use of uninitialized memory in kronecker_product test. 2024-02-20 08:44:34 -08:00
Antonio Sánchez
6b365e74d6 Fix GPU build for ptanh_float. 2024-02-20 16:08:50 +00:00
Antonio Sánchez
b14c5d0fa1 Fix real schur and polynomial solver. 2024-02-17 15:22:11 +00:00
Charles Schlosser
8a4118746e fix exp complex test: use int instead of index 2024-02-17 03:55:32 +00:00
Charles Schlosser
960892ca13 JacobiSVD: get rid of m_scaledMatrix, m_adjoint, hopefully fix some compiler warnings 2024-02-17 03:41:55 +00:00
Charles Schlosser
18a161bf17 fix pexp_complex_test 2024-02-17 03:08:23 +00:00
Damiano Franzò
be06c9ad51 Implement float pexp_complex 2024-02-17 00:26:57 +00:00
Rasmus Munk Larsen
4d419e2209 Rename generic_fast_tanh_float to ptanh_float and move it to... 2024-02-16 21:27:22 +00:00
Antonio Sánchez
2a9055b50e Fix random for custom scalars that don't have constexpr digits(). 2024-02-16 02:30:54 +00:00
Antonio Sánchez
500a3602f0 Use traits<Matrix>::Options instead of Matrix::Options. 2024-02-16 00:11:57 +00:00
Antonio Sánchez
0b9ca1159b Fix deflation in BDCSVD. 2024-02-15 23:53:59 +00:00
Antonio Sánchez
f40ad38fda Fix failure on ARM with latest compilers. 2024-02-14 23:00:56 +00:00
Antonio Sánchez
a24bf2e9a2 Disable float16 packet casting if native AVX512 f16 is available. 2024-02-14 20:05:00 +00:00
Antonio Sánchez
5361dea833 Remove return int types from BLAS/LAPACK functions. 2024-02-14 19:51:36 +00:00
Alec Jacobson
7e655c9a5d Fixes 2780 2024-02-13 02:57:43 +00:00
Antonio Sánchez
6ea33f95df Eliminate warning about writing bytes directly to non-trivial type. 2024-02-12 23:27:48 +00:00
Antonio Sánchez
06b45905e7 Remove r_cnjg due to conflicts with f2c. 2024-02-12 23:16:03 +00:00
Antonio Sánchez
9229cfa822 Fix division by zero UB in packet size logic. 2024-02-12 21:01:19 +00:00
Antonio Sanchez
186f8205db Apply clang-format to lapack/blas directories 2024-02-12 19:36:07 +00:00
Gautam Jha
4eac211e96 Fix C++20 error, Arithmetic between different enumeration types 2024-02-12 04:25:04 +00:00
Tyler Veness
d1d87973f4 Fix segfault in CholmodBase::factorize() for zero matrix 2024-02-12 03:27:56 +00:00
Antonio Sánchez
7b87b21910 Fix UB in bool packetmath test. 2024-02-09 19:46:45 +00:00
Charles Schlosser
431e4a913b Fix the fuzz 2024-02-07 04:52:19 +00:00
Charles Schlosser
3ab8f48256 fix tests when scalar is bfloat16, half 2024-02-07 04:50:11 +00:00
Antonio Sánchez
3ebaab8a63 Fix PPC rand and other failures. 2024-02-05 20:07:15 +00:00
Charlie Schlosser
ebd13c3b14 fix skew symmetric test 2024-02-04 21:13:06 -05:00
Antonio Sanchez
128c8abf44 Fix gcc-6 bug in the rand test. 2024-02-02 15:36:23 -08:00
Charles Schlosser
d626762e3f improve random 2024-01-31 08:16:29 +00:00
Antonio Sánchez
a9ddab3e06 Fix a bunch of ODR violations. 2024-01-30 22:38:43 +00:00
Damiano Franzò
7fd7a3f946 Implement plog_complex 2024-01-30 19:06:05 +00:00
Antonio Sánchez
043442e21b Fix preshear transformation. 2024-01-30 06:37:33 +00:00
Antonio Sánchez
69ee52ed13 Remove Skyline. 2024-01-30 00:13:17 +00:00
Antonio Sánchez
0f0c76dc29 Use stableNorm in ComplexEigenSolver. 2024-01-29 23:46:23 +00:00
Antonio Sánchez
dd71c23e23 Remove MoreVectorization. 2024-01-29 18:48:28 +00:00
Pascal Getreuer
62b5474032 Fix busted formatting in Eigen::Tensor README.md. 2024-01-29 08:12:43 +00:00
Antonio Sánchez
d8d0b60b59 Fix CI for clang-6 when cross-compiled. 2024-01-27 05:13:21 +00:00
Antonio Sánchez
f391289150 Fix bug in checking subnormals. 2024-01-25 17:52:07 +00:00
Antonio Sanchez
5a90fbceaa Update documentation of lapack second/dsecnd. 2024-01-25 17:51:26 +00:00
Antonio Sánchez
9079820241 Remove simple relicense script. 2024-01-25 05:50:36 +00:00
Antonio Sánchez
851b40afd8 LAPACK CPU time functions. 2024-01-23 23:58:58 +00:00
Antonio Sánchez
a73970a864 Fix arm32 issues. 2024-01-23 22:04:55 +00:00
Antonio Sánchez
5808122017 Formatting. 2024-01-23 16:56:27 +00:00
Antonio Sánchez
92f9544f6d Remove explicit specialization of member function. 2024-01-23 16:40:43 +00:00
Antonio Sanchez
2692fb2b71 Fix compile-time error caused by chip static asserts 2024-01-22 21:40:48 +00:00
Cheng Wang
2c6b61c006 Add half and quarter vector support to HVX architecture 2024-01-22 21:23:21 +00:00
Tobias Wood
05a457534f Chipping Asserts v2 2024-01-22 18:08:23 +00:00
Martin Heistermann
fc92fe3125 SPQR: Fix build error, Index/StorageIndex mismatch. 2024-01-22 17:37:36 +00:00
Andreas Forster
eede526b7c [Compressed Storage] Use smaller type of Index & StorageIndex for determining maximum size during resize. 2024-01-22 00:35:31 +00:00
Antonio Sánchez
772057c558 Revert "Add asserts for .chip" 2024-01-20 05:14:42 +00:00
Antonio Sánchez
6163dbe2bc Allow specifying a temporary directory for fileio outputs. 2024-01-20 00:55:14 +00:00
Antonio Sanchez
6b6bb9d34e Fix unused warnings in failtest. 2024-01-19 15:30:18 -08:00
Antonio Sánchez
f6e41e6433 Revert "Clean up stableNorm" 2024-01-19 20:22:47 +00:00
Tobias Wood
b1ae206ea6 Add asserts for .chip 2024-01-19 19:18:19 +00:00
Nuno Gonçalves
b0f906419e add missing constexpr qualifier 2024-01-19 18:49:53 +00:00
Antonio Sánchez
34fd46a9b4 Update CI with testing framework from eigen_ci_cross_testing. 2024-01-19 17:55:09 +00:00
Antonio Sanchez
b2814d53a7 Fix stableNorm when input is zero-sized. 2024-01-16 10:14:51 -08:00
Charles Schlosser
c29a410116 check pointers before freeing 2024-01-12 06:09:46 +00:00
Chao Chen
48e0c827da [ROCm] MI300 related test support 2024-01-11 23:46:25 +00:00
Antonio Sánchez
5385773015 Fix TensorForcedEval in the case of the evaluator being copied. 2024-01-10 00:45:39 +00:00
Nicolas Cornu
3f3bc6d862 Improve documentation of SparseLU 2024-01-09 18:18:05 +00:00
Arnaud Billon
d33174d5ae Doc: Fix Basic slicing examples minor issues 2024-01-09 18:17:00 +00:00
Tyler Veness
19b1192886 Add factor getters to Cholmod LLT/LDLT 2024-01-09 02:14:04 +00:00
Charles Schlosser
a1a96fafde Clean up stableNorm 2024-01-08 23:28:41 +00:00
Antonio Sánchez
3026f1f296 Fix various asan errors. 2024-01-08 00:13:17 +00:00
Antonio Sánchez
a2cf99ec6f Fix GPU+clang+asan. 2024-01-04 17:29:37 +00:00
Antonio Sánchez
fee5d60b50 Fix MSAN failures. 2023-12-22 03:18:46 +00:00
Antonio Sánchez
9697d481c8 Fix up clang-format CI. 2023-12-14 00:15:11 +00:00
Tobias Wood
85efa83292 Set up clang-format in CI 2023-12-13 21:08:07 +00:00
Charles Schlosser
2c4541f735 fix msvc clz 2023-12-13 03:33:49 +00:00
Antonio Sánchez
75e273afcc Add internal ctz/clz implementation. 2023-12-11 21:03:09 +00:00
Antonio Sanchez
454f89af9d Protect kernel launch syntax from clang-format 2023-12-05 14:26:44 -08:00
Rasmus Munk Larsen
383506fcb2 Fix CUDA syntax error introduced by clang-format. 2023-12-05 14:06:27 -08:00
Antonio Sánchez
61b7155d77 Add formatting change to .git-blame-ignore-revs 2023-12-05 21:56:31 +00:00
Antonio Sánchez
46e9cdb7fe Clang-format tests, examples, libraries, benchmarks, etc. 2023-12-05 21:22:55 +00:00
Antonio Sánchez
3252ecc7a4 Fix scalar_logistic_function overflow for complex inputs. 2023-12-05 18:21:04 +00:00
Antonio Sánchez
9688081029 Add .git-blame-ignore-revs file. 2023-12-01 01:05:50 +00:00
Tobias Wood
f38e16c193 Apply clang-format 2023-11-29 11:12:48 +00:00
Drew Lewis
9ea520fc45 Ensure that mc is not smaller than Traits::nr 2023-11-28 22:48:53 +00:00
Antonio Sánchez
dd8c71e628 Fix typecasting for arm32 2023-11-23 00:47:50 +00:00
Tobias Wood
b2cb49e280 Static asserts to check for matching NumDimensions 2023-11-22 23:29:22 +00:00
Charles Schlosser
283dec7f25 Update file GeneralMatrixVector.h 2023-11-21 19:50:35 +00:00
Pavel Labath
66b9f4ed5c Fix (u)int64_t->float conversion on arm 2023-11-21 16:09:12 +00:00
Charles Schlosser
d1b03fb5c9 Gemv microoptimization 2023-11-20 17:26:39 +00:00
Rasmus Munk Larsen
3cf6bb6f1c Fix a bug in commit 76e8c04553: 2023-11-15 21:45:37 +00:00
Charles Schlosser
32165c6f0c Fix Wshorten-64-to-32 warning in gemm parallelizer 2023-11-14 13:51:27 +00:00
Rasmus Munk Larsen
b33dbb5765 Fix implicit narrowing warning in Parallelizer.h. 2023-11-13 21:30:39 +00:00
Antonio Sanchez
3a9635b20c Link pthread for product_threaded test 2023-11-13 11:34:23 -08:00
wk
f78c37f0af traits<Ref>::match: use correct strides 2023-11-11 14:10:56 +00:00
Rasmus Munk Larsen
516d08a490 Fix typo in Parallelizer.h 2023-11-10 20:29:29 +00:00
Rasmus Munk Larsen
76e8c04553 Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP. 2023-11-10 17:42:30 +00:00
Antonio Sánchez
4d54c43d6c Fix typo to allow nomalloc test to pass on AVX512. 2023-11-06 18:58:43 +00:00
Antonio Sánchez
a25f02d73e Fix int overflow causing cxx11_tensor_gpu_1 to fail. 2023-11-06 17:10:16 +00:00
Charles Schlosser
6f9ad7da61 fix Wshorten-64-to-32 warnings in div_ceil 2023-10-27 15:52:00 +00:00
Charles Schlosser
1aac9332ce TensorReduction: replace divup with div_ceil 2023-10-25 16:44:34 +00:00
Kyle Macfarlan
5de0f2f89e Fixes #2735: Component-wise cbrt 2023-10-25 03:06:13 +00:00
Antonio Sánchez
48b254a4bc Disable denorm deprecation warnings in MSVC C++23. 2023-10-23 17:56:04 +00:00
Antonio Sánchez
176821f2f7 Avoid building docs if cross-compiling or not top level. 2023-10-23 17:55:01 +00:00
Antonio Sánchez
aa6964bf3a Work around MSVC issue in Block XprType. 2023-10-19 22:02:03 +00:00
Anatoly Borisov
877c2d1e9b fix typo in comment 2023-10-18 12:58:49 +00:00
Antonio Sánchez
0c9526912c Pass div_ceil arguments by value. 2023-10-17 18:46:19 +00:00
Ioannis Assiouras
d9839718aa [ROCm] Replace HIP_PATH with ROCM_PATH for rocm 6.0 2023-10-16 20:56:35 +00:00
Antonio Sánchez
5bdf58b8df Eliminate use of _res. 2023-10-16 19:56:53 +00:00
Rasmus Munk Larsen
a96545777b Consolidate multiple implementations of divup/div_up/div_ceil. 2023-10-10 17:16:59 +00:00
Charles Schlosser
e8515f78ac Fix sparse triangular view iterator 2023-10-05 17:13:37 +00:00
Kevin
6d829e766f Fix extra semicolon in XprHelper 2023-09-14 08:18:28 +00:00
Alejandro Acosta
ba47341a14 [SYCL-2020] Enabling half precision support for SYCL. 2023-09-13 09:12:40 +00:00
François Girinon
92a77a596b Fix call to static functions from device by adding EIGEN_DEVICE_FUNC attribute to run methods 2023-09-13 04:16:52 +00:00
Antonio Sánchez
8f858a4ea8 Export ThreadPool symbols from legacy header. 2023-09-10 20:56:20 +00:00
Chip Kerchner
4e598ad259 New panel modes for GEMM MMA (real & complex). 2023-09-06 20:03:45 +00:00
Daniel Benedí
2c64a655fe Stage will not be ok if pardiso returned error 2023-09-06 14:40:06 +02:00
Charles Schlosser
18018ed013 Unwind Block of Blocks 2023-08-29 17:21:41 +00:00
Charles Schlosser
81b48065ea Fix arm32 float division and related bugs 2023-08-29 00:36:07 +00:00
Antonio Sánchez
2873916f1c Rename plugin headers to .inc. 2023-08-21 16:26:11 +00:00
Antonio Sánchez
6e4d5d4832 Add IWYU private pragmas to internal headers. 2023-08-21 16:25:22 +00:00
Antonio Sánchez
328b5f9085 Add temporary macro to allow unaligned scalar UB. 2023-08-15 15:58:41 +00:00
Charles Schlosser
a798d07659 Fix tensor stridedlinearbuffercopy 2023-08-03 20:36:42 +00:00
Charles Schlosser
8d9f467036 fix boost mp test to refer to new svd tests 2023-08-02 13:38:12 +00:00
Antonio Sánchez
0ae7d7a361 Fix unaligned scalar alignment UB. 2023-08-01 19:39:08 +00:00
cheng wang
66e8f38891 Add architecture definition files for Qualcomm Hexagon Vector Extension (HVX) 2023-08-01 17:47:57 +00:00
Antonio Sánchez
2af700aa40 Fix nullptr dereference in SVD. 2023-08-01 16:33:16 +00:00
Rasmus Munk Larsen
86a43d8c04 Fix clang-tidy warning 2023-07-31 21:26:28 +00:00
Antonio Sánchez
0cef325b07 Fix another UB access. 2023-07-31 19:18:45 +00:00
Charles Schlosser
5527e78a64 Add missing x86 pcasts 2023-07-28 23:41:38 +00:00
Alejandro Acosta
24d15e086f [SYCL-2020] Add test to validate SYCL in Eigen core. 2023-07-28 15:45:08 +00:00
Antonio Sánchez
d4ae542ed1 Fix nullptr dereference issue in triangular product. 2023-07-27 22:10:21 +00:00
Chip Kerchner
7769eb1b2e Fix problems with recent changes and Tensorflow in Power 2023-07-26 16:24:58 +00:00
Yingnan Wu
ba1cb6e45e Fixes #2703 by adding max_digits10 function 2023-07-26 16:02:52 +00:00
Charles Schlosser
9995c3da6f Fix -Wmaybe-uninitialized in SVD 2023-07-25 22:22:17 +00:00
Charles Schlosser
4e9e493b4a Fix -Waggressive-loop-optimizations 2023-07-21 03:47:40 +00:00
Charles Schlosser
6e7abeae69 fix arm build warnings 2023-07-17 20:37:27 +00:00
Charles Schlosser
81fe2d424f Fix more gcc compiler warnings / sort-of bugs 2023-07-14 21:12:45 +00:00
Charles Schlosser
21cd3fe209 Optimize check_rows_cols_for_overflow 2023-07-10 17:40:17 +00:00
Antonio Sánchez
9297aae66f Fix AVX512 nomalloc issues in trsm. 2023-07-10 16:42:13 +00:00
Charles Schlosser
1a2bfca8f0 Fix annoying warnings 2023-07-07 20:19:58 +00:00
Antonio Sánchez
63dcb429cd Fix use of arg function in CUDA. 2023-07-07 18:37:14 +00:00
Marcus Comstedt
8f927fb52e Altivec: fix compilation with C++20 and higher 2023-07-05 13:14:02 +00:00
Kevin Leonardic
d4b05454a7 Fix argument for _mm256_cvtps_ph imm parameter 2023-07-03 13:44:20 +02:00
Charles Schlosser
15ac3765c4 Fix ivcSize return type in IndexedViewMethods.h 2023-07-03 03:49:37 +00:00
Chip Kerchner
3791ac8a1a Fix supportsMMA to obey EIGEN_ALTIVEC_MMA_DYNAMIC_DISPATCH compilation flag and compiler support. 2023-06-28 17:57:21 +00:00
H S Helson Go
bc57b926a0 Add Quaternion constructor from real scalar and imaginary vector 2023-06-27 05:38:17 +00:00
Antonio Sánchez
31cd2ad371 Ensure EIGEN_HAS_ARM64_FP16_VECTOR_ARITHMETIC is always defined on arm. 2023-06-26 19:21:54 +00:00
Antonio Sánchez
7465b7651e Disable FP16 arithmetic for arm32. 2023-06-26 18:39:42 +00:00
Rasmus Munk Larsen
b3267f6936 Remove unused variable in test/svd_common.h. 2023-06-23 23:12:19 +00:00
Chip Kerchner
211c5dfc67 Add optional offset parameter to ploadu_partial and pstoreu_partial 2023-06-23 19:53:05 +00:00
Charles Schlosser
44c20bbbe3 rint round floor ceil 2023-06-23 16:29:16 +00:00
Charles Schlosser
6ee86fd473 delete deprecated function call in svd test 2023-06-23 14:17:27 +00:00
Charles Schlosser
387175c258 Fix safe_abs in int_pow 2023-06-23 04:12:41 +00:00
Charles Schlosser
c6db610bc7 Fix svd test 2023-06-22 17:37:24 +00:00
Charles Schlosser
969c31eefc Fix AVX pstore 2023-06-15 01:47:38 +00:00
wilfried.karel
6c1411e521 define a move constructor for Ref<const...> 2023-06-14 20:10:51 +00:00
wilfried.karel
d8f3eb87bf Compile- and run-time assertions for the construction of Ref<const>. 2023-06-14 15:49:58 +00:00
Charles Schlosser
59b3ef5409 Partially Vectorize Cast 2023-06-09 16:54:31 +00:00
Rasmus Munk Larsen
7d7576f326 Avoid underflow in prsqrt. 2023-06-06 14:06:19 -07:00
Charles Schlosser
b7151ffaab Fix unary pow error handling and test 2023-06-06 18:46:55 +00:00
Rasmus Munk Larsen
7ac8897431 Reduce max relative error of prsqrt from 3 to 2 ulps. 2023-06-04 22:25:33 +00:00
Charles Schlosser
1d80e23186 Optimize scalar_unary_pow_op error handling 2023-06-02 18:53:06 +00:00
Alexander Shaposhnikov
316eab8deb Do not set EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC for cuda compilation 2023-05-31 15:15:06 +00:00
Alejandro Acosta
07e4604b19 Replace usage of CudaStreamDevice with GpuStreamDevice in tensor benchmarks GPU 2023-05-30 15:44:07 +00:00
Rasmus Munk Larsen
8c43bf2b5b Clean up Redux.h and fix vectorization_logic test after changes to traversal order in Redux. 2023-05-24 20:26:52 +00:00
Charles Schlosser
da6a71faf0 Add linear redux evaluators 2023-05-24 17:07:25 +00:00
Charles Schlosser
67a1e881d9 Sparse matrix column/row removal 2023-05-24 17:04:45 +00:00
Rasmus Munk Larsen
de1c884687 Add reference to writeup of approach used in canonicalEulerAngles. 2023-05-24 15:52:26 +00:00
Charles Schlosser
307a417e1c Fix unrolled assignment evaluator 2023-05-22 16:39:24 +00:00
Juraj Oršulić
c18f94e3b0 Geometry/EulerAngles: introduce canonicalEulerAngles 2023-05-19 15:42:22 +00:00
Charles Schlosser
7d9bb90f15 SVD: fix numerous compiler warnings / failures 2023-05-15 16:56:47 +00:00
Rasmus Munk Larsen
2709f4c8fb Use relative path to include EmulateArray.h in CXX11Meta.h, and get rid of redundant meta-programming code, which was moved to Core. 2023-05-09 23:21:35 +00:00
Rasmus Munk Larsen
9a02c977ec Use relative paths to include Meta.h and MaxSizeVector.h in Tensor 2023-05-09 22:07:55 +00:00
Rasmus Munk Larsen
96c42771d6 Make it possible to override the synchonization primitives used by the threadpool using macros. 2023-05-09 19:36:17 +00:00
Rasmus Munk Larsen
1321821e86 Add missing braces in Umeyama.h 2023-05-09 19:10:50 +00:00
Rasmus Munk Larsen
524c329ab2 Work around compiler bug in Umeyama.h. 2023-05-09 18:53:56 +00:00
Charles Schlosser
fbf7189bd5 Fix cuda compilation 2023-05-08 16:15:47 +00:00
Mehdi Goli
0623791930 [SYCL-2020] Enabling USM support for SYCL. SYCL-1.2.1 did not have support for USM. 2023-05-05 17:30:36 +00:00
Andrzej Ciarkowski
1698c367a0 Use std::shared_ptr for FFTW/IMKL FFT plan implementation; Fixes #2651 2023-05-05 16:58:23 +00:00
Antonio Sánchez
1f79a6078f Return NaN in ndtri for values outside valid input range. 2023-05-05 16:27:26 +00:00
Tobias Wood
94f57867fe Thread pool 2023-05-05 16:23:34 +00:00
Charles Schlosser
9eb8e2afba Change array_cwise test name 2023-05-05 03:08:43 +00:00
Charles Schlosser
725c11719b Visitor: fix modulo by zero compiler warning 2023-05-04 18:21:09 +00:00
Chip Kerchner
b8208b363c Specialized loadColData correctly - fix previous BF16 GEMV MR 2023-05-04 16:38:17 +00:00
Charles Schlosser
2af03fb685 clean up array_cwise test 2023-05-04 16:02:08 +00:00
Chip Kerchner
fda1373a15 Fix ColMajor BF16 GEMV for when vector is RowMajor 2023-05-03 20:12:50 +00:00
Charles Schlosser
fdc749de2a JacobiSVD: set m_nonzeroSingularValues to zero if not finite 2023-05-02 17:48:21 +00:00
Chip Kerchner
6418ac0285 Unroll F32 to BF16 loop - 1.8X faster conversions for LLVM. Use vector pairs for GCC. 2023-05-01 16:54:16 +00:00
Pedro Gonnet
874f5947f4 Add half-Packet operations to StridedLinearBufferCopy. 2023-05-01 16:09:31 +00:00
Charles Schlosser
c9a14f48d9 SSE Packet4ui has pcmp, pmin, pmax 2023-04-28 20:36:08 +00:00
Rasmus Munk Larsen
0b51f763cb Revert "Geometry/EulerAngles: make sure that returned solution has canonical ranges"
This reverts commit 7f06bcae2c
2023-04-27 00:06:23 +00:00
Antonio Sánchez
2d0c6ad873 Revert "Vectorize cast"
This reverts commit eb5ff1861a
2023-04-26 18:03:36 +00:00
Charles Schlosser
8999525c29 AVX2: Packet4ul has pmul, abs2 2023-04-26 16:22:16 +00:00
Charles Schlosser
eb5ff1861a Vectorize cast 2023-04-26 02:50:13 +00:00
Antonio Sánchez
3918768be1 Fix sparse iterator and tests. 2023-04-25 19:05:49 +00:00
Antonio Sanchez
70410310a4 Fix boolean bitwise and warning. 2023-04-25 15:24:49 +00:00
Charles Schlosser
f6cf5dca80 Packet4ul does not have Abs2 2023-04-21 19:48:01 +00:00
Chip Kerchner
03f646b7e3 New VSX version of BF16 GEMV (Power) - up to 6.7X faster 2023-04-21 17:06:59 +00:00
Charles Schlosser
29c8e3c754 fix pow for uint32_t, disable pmul<Packet4ul> 2023-04-21 05:47:56 +00:00
Juraj Oršulić
7f06bcae2c Geometry/EulerAngles: make sure that returned solution has canonical ranges 2023-04-19 19:12:24 +00:00
Rasmus Munk Larsen
a347dbbab2 Delete last few occurences of HasHalfPacket. 2023-04-19 10:36:59 -07:00
Rasmus Munk Larsen
b378014fef Make sure we return +/-1 above the clamping point for Erf(). 2023-04-18 20:53:01 +00:00
Charles Schlosser
e2bbf496f6 Use select ternary op in tensor select evaulator 2023-04-18 20:52:16 +00:00
Charles Schlosser
2b954be663 fix typo in sse packetmath 2023-04-18 18:17:41 +00:00
Rasmus Munk Larsen
25685c90ad Fix incorrect packet type for unsigned int version of pfirst() in MSVC workaround in PacketMath.h. 2023-04-18 17:46:23 +00:00
Rasmus Munk Larsen
1e223a956c Add missing 'f' in float literal in SpecialFunctionsImpl.h that triggers implicit conversion warning. 2023-04-18 17:33:29 +00:00
Chip Kerchner
3f3ce214e6 New BF16 pcast functions and move type casting to TypeCasting.h 2023-04-18 02:38:38 +00:00
Pedro Gonnet
17b5b4de58 Add Packet4ui, Packet8ui, and Packet4ul to the SSE/AVX PacketMath.h headers 2023-04-17 23:33:59 +00:00
Charles Schlosser
87300c93ca Refactor IndexedView 2023-04-17 12:32:50 +00:00
Chip Kerchner
1148f0a9ec Add dynamic dispatch to BF16 GEMM (Power) and new VSX version 2023-04-14 22:20:42 +00:00
Rasmus Munk Larsen
3026fc0d3c Improve accuracy of erf(). 2023-04-14 16:57:56 +00:00
Rasmus Munk Larsen
554fe02ae3 Enable new AVX512 GEMM kernel by default. 2023-04-12 13:39:06 -07:00
Charles Schlosser
0d12fcc34e Insert from triplets 2023-04-12 20:01:48 +00:00
Rob Conde
990a282fc4 exclude Eigen/Core and Eigen/src/Core from being ignored due to core ignore rule 2023-04-12 10:42:21 -04:00
Rohit Goswami
b0eded878d DOC: Update documentation for 3.4.x 2023-04-06 19:20:41 +00:00
Rasmus Munk Larsen
b0f877f8e0 Don't crash on empty tensor contraction. 2023-04-05 17:06:14 +00:00
b-shi
15fbddaf9b ASAN fixes for AVX512 GEMM/TRSM 2023-04-04 15:54:24 -07:00
Charles Schlosser
178ef8c97f qualify non-const symbolic indexed view with is_lvalue 2023-04-04 19:06:32 +00:00
Rasmus Munk Larsen
df1049ddf4 Small packet math cleanup. 2023-04-04 16:14:32 +00:00
Antoine Hoarau
9b48d10215 Guard all malloc, realloc and free() fonctions with check_that_malloc_is_allowed() 2023-04-04 04:24:22 +00:00
Rasmus Munk Larsen
c730290fa0 Use the correct truncating intrinsic for double->int casting. 2023-04-03 13:56:41 -07:00
Charles Schlosser
766db02020 disable raw array indexed view access for 1d arrays 2023-03-29 02:39:45 +00:00
Charles Schlosser
bfbc66e078 refactor indexedviewmethods, enable non-const ref access with symbolic indices 2023-03-29 01:35:26 +00:00
Rasmus Munk Larsen
1a5dfd7c0f Fix incorrect casting in AVX512DQ path. 2023-03-27 09:28:06 -07:00
Charles Schlosser
a08649994f Optimize generic_rsqrt_newton_step 2023-03-24 22:42:57 +00:00
Rasmus Munk Larsen
b8b8a26145 Add more missing vectorized casts for int on x86, and remove redundant unit tests 2023-03-24 16:02:00 +00:00
unageek
33e206f714 Remove unused declarations of BLAS/LAPACK routines 2023-03-23 21:54:05 +00:00
Rasmus Munk Larsen
d57a79e512 Optimize float->bool cast for AVX2, based on Charles Schlosser's comments. 2023-03-21 20:59:25 -07:00
Rasmus Munk Larsen
a5ae832773 Fix reversal of arguments to _mm256_set_m128() in pcast<Packet4d, Packet8f>. 2023-03-22 03:21:44 +00:00
Rasmus Munk Larsen
09945f2cc1 Optimize casting for x86_64. 2023-03-21 18:24:16 +00:00
Colin Broderick
8f9b8e3630 Replaced all instances of internal::(U)IntPtr with std::(u)intptr_t. Remove ICC workaround. 2023-03-21 16:50:23 +00:00
Antonio Sánchez
2c8011c2dd Fix arm builds. 2023-03-20 16:59:38 +00:00
Charles Schlosser
fd8f410bbe Fix 2624 2625 2023-03-20 16:30:04 +00:00
Chip Kerchner
e887196d9d Undo cmake pools changes 2023-03-17 16:06:26 +00:00
Jonas Schulze
81cb6a51d0 Fix some typos 2023-03-16 23:11:43 +00:00
Antonio Sánchez
555cec17ed Fix parsing of command-line arguments when already specified as a cmake list. 2023-03-16 22:47:38 +00:00
Chip Kerchner
7db19baabe Remove pools if cmake is less than 3.11 2023-03-16 16:54:45 +00:00
Rasmus Munk Larsen
0488b708b4 Vectorize tensor.isnan() by using typed predicates. 2023-03-16 04:04:22 +00:00
Rasmus Munk Larsen
f02856c640 Use EIGEN_NOT_A_MACRO macro (oh the irony!) to avoid build issue in TensorFlow. 2023-03-15 11:42:57 -07:00
Rasmus Munk Larsen
690ae9502f Use C++11 standard features for detecting presence of Inf and NaN 2023-03-15 16:52:44 +00:00
Chip Kerchner
d71ac6a755 Fix recent PowerPC warnings and clang warning 2023-03-15 16:50:46 +00:00
Chip Kerchner
d54d228b49 Limit the number of build jobs to 8 and link jobs to 4 for PowerPC. This should help reduce the OOM build problems. 2023-03-15 16:29:41 +00:00
Chip Kerchner
23e1541863 Put deadcode checks back in from previous change. 2023-03-14 00:57:16 +00:00
Chip Kerchner
6c58f0fe1f Revert changes that made BF16 GEMM to cause bad register spillage for LLVM (Power) 2023-03-13 23:36:06 +00:00
Rasmus Munk Larsen
8fe6190001 Add numext::isnan for AnnoyingOrange^H^H^H^H^H^HScalar. 2023-03-13 21:19:35 +00:00
Rasmus Munk Larsen
79de101d23 Handle PropagateFast the same way as PropagateNaN in minmax visitor to 2023-03-13 20:47:11 +00:00
Chip Kerchner
9d72412385 Add MMA to BF16 GEMV - 5.0-6.3X faster (for Power) 2023-03-13 19:37:13 +00:00
Rasmus Munk Larsen
2067b54b13 Fix bug in minmax_coeff_visitor for matrix of all NaNs. 2023-03-13 18:25:22 +00:00
Rasmus Munk Larsen
ee0ff0ab3a Fix typo in MathFunctions.h 2023-03-13 15:50:40 +00:00
Rasmus Munk Larsen
21c49e8f8e Delete mystery character from Eigen/src/Core/arch/NEON/MathFunctions.h 2023-03-10 23:27:24 +00:00
Rasmus Munk Larsen
6bb9609bcb Make new Select implementation backwards compatible. 2023-03-10 23:07:47 +00:00
Antonio Sánchez
394aabb0a3 Fix failing MSVC tests due to compiler bugs. 2023-03-10 22:36:57 +00:00
Rasmus Munk Larsen
d6235d76db Clean up generic packetmath specializations for various backends with the help of a macro. 2023-03-10 22:02:23 +00:00
Rasmus Munk Larsen
e8fdf127c6 Work around compiler bug in Tridiagonalization.h 2023-03-10 21:21:07 +00:00
Rasmus Munk Larsen
adf26b6840 Add newline to end of file. 2023-03-10 16:53:22 +00:00
Rasmus Munk Larsen
3492d9e2e5 s/Lesser/Less/ 2023-03-10 00:28:31 +00:00
Rasmus Munk Larsen
2419632cf5 Revert change to allFinite(), since the new version does not work for complex numbers. 2023-03-09 21:50:43 +00:00
Zach Davis
b1beba8a3e Fix LinAlgSVD example code 2023-03-08 17:04:59 +00:00
Charles Schlosser
7bf2968fed Specify Permutation Index for PartialPivLU and FullPivLU 2023-03-07 20:28:05 +00:00
Antonio Sánchez
eb4dbf6135 Modify failing cwise test to get it to pass. 2023-03-07 19:47:42 +00:00
Timofey Pushkin
e577f43ab2 Set CMAKE_* cache variables only when Eigen is a top-level project 2023-03-07 14:39:45 +00:00
Charles Schlosser
1ce8b25825 Vectorize any() / all() 2023-03-06 23:54:02 +00:00
Charles Schlosser
cb8e6d4975 Fix 2240, 2620 2023-03-06 23:11:06 +00:00
Charles Schlosser
d670039309 fix tensor comparison test 2023-03-06 13:11:14 +00:00
Chip Kerchner
2b513ca2a0 Added partial linear access for LHS & Output - 30% faster for bfloat16 GEMM MMA (Power) 2023-03-02 19:22:43 +00:00
Charles Schlosser
0b396c3167 Scalarize comps 2023-03-02 17:06:23 +00:00
Charles Schlosser
3abe12472e fix signed shift test 2023-03-01 14:31:13 +00:00
Antonio Sánchez
ba7417f146 Fix gpu conv3d out-of-resources failure. 2023-02-28 21:25:00 +00:00
Antonio Sánchez
62d5cfe835 Fix ODR issues with Intel's AVX512 TRSM kernels. 2023-02-27 07:54:52 +00:00
Charles Schlosser
826627f653 vectorize comparisons and select by enabling typed comparisons 2023-02-25 20:52:11 +00:00
Rasmus Munk Larsen
2e9b945baf Fix bug that disabled vectorization for coeffMin/coeffMax. 2023-02-25 20:03:54 +00:00
Antonio Sánchez
bc5cdc7a67 Guard use of long double on GPU device. 2023-02-24 21:49:59 +00:00
Chip Kerchner
e4598fedbe Fix compiler versions for certain instructions on Power. 2023-02-23 23:24:41 +00:00
Rasmus Munk Larsen
1c0a6cf228 Get rid of EIGEN_HAS_AVX512_MATH workaround. 2023-02-23 23:16:41 +00:00
Rasmus Munk Larsen
00844e3865 Fix a number of MSAN failures in SVD tests. 2023-02-23 18:44:53 +00:00
Mehdi Goli
c3f67063ed [SYCL-2020]- null placeholder accessor issue in Reduction SYCL test 2023-02-22 17:44:53 +00:00
Rasmus Munk Larsen
6bcd941ee3 Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%. 2023-02-21 20:09:29 +00:00
Rasmus Munk Larsen
ce62177b5b Vectorize atanh & add a missing definition and unit test for atan. 2023-02-21 03:14:05 +00:00
Charles Schlosser
049a144798 Add typed logicals 2023-02-18 01:23:47 +00:00
Chip Kerchner
e797974689 Add and enable Packet int divide for Power10. 2023-02-17 19:04:18 +00:00
Chip Kerchner
54459214a1 Fix epsilon and dummy_precision values in long double for double doubles. Prevented some algorithms from converging on PPC. 2023-02-16 23:35:42 +00:00
Antonio Sánchez
a16fb889dd Guard complex sqrt on old MSVC compilers. 2023-02-16 19:47:00 +00:00
Charles Schlosser
94b19dc5f2 Add CArg 2023-02-15 21:33:06 +00:00
Charles Schlosser
71a8e60a7a Tweak pasin_float, fix psqrt_complex 2023-02-15 01:01:14 +00:00
Antonio Sánchez
384269937f More NEON packetmath fixes. 2023-02-14 21:45:25 +00:00
Antonio Sánchez
c15b386203 Fix MSVC atan2 test. 2023-02-14 18:30:58 +00:00
Antonio Sánchez
2dfbf1b251 Fix NEON make_packet2f. 2023-02-14 16:52:07 +00:00
Rasmus Munk Larsen
07aaa62e6f Fix compiler warnings in tests. 2023-02-14 02:29:03 +00:00
Chip Kerchner
4a03409569 Fix problem with array conversions BF16->F32 in Power. 2023-02-13 21:30:45 +00:00
Rasmus Munk Larsen
77b48c440e Fix compiler warnings. 2023-02-10 20:46:23 +00:00
Chip Kerchner
0ecae61568 Disable array BF16 to F32 conversions in Power 2023-02-10 20:06:58 +00:00
Charles Schlosser
c999284bad Print diagonal matrix 2023-02-10 18:07:29 +00:00
Chip Kerchner
fba12e02b3 Fold extra column calculations into an extra MMA accumulator and other bfloat16 MMA GEMM improvements 2023-02-10 17:32:06 +00:00
Chip Kerchner
79cfc74f4d Revert ODR changes and make gemm_extra_cols and gemm_complex_extra_cols EIGEN_ALWAYS_INLINE to avoid external functions. 2023-02-10 17:05:07 +00:00
Alexander Grund
f9659d91f1 Fix ODR violation with gemm_extra_cols on PPC 2023-02-09 22:16:06 +00:00
Charles Schlosser
325e3063d9 Optimize psign 2023-02-09 22:15:26 +00:00
Charles Schlosser
0e490d452d Update file ColPivHouseholderQR_LAPACKE.h 2023-02-09 13:45:56 +00:00
Antonio Sánchez
0a5392d606 Fix MSVC arm build. 2023-02-08 21:46:37 +00:00
Antonio Sánchez
3f7e775715 Add IWYU export pragmas to top-level headers. 2023-02-08 17:40:31 +00:00
Rasmus Munk Larsen
e4f58816d9 Get rid of custom implementation of equal_to and not_equal_no. No longer needed with c+14. 2023-02-07 21:36:44 -08:00
Antonio Sánchez
e256ad1823 Remove LGPL Code and references. 2023-02-08 01:25:06 +00:00
Chip Kerchner
e71f88abce Change in Power eigen_asserts to eigen_internal_asserts since it is putting unnecessary error checking and assertions without NDEBUG. 2023-02-08 00:57:30 +00:00
Gregory Kramida
232b18fa8a Fixes #2602 2023-02-06 22:52:39 +00:00
Antonio Sánchez
f6cc359e10 More EIGEN_DEVICE_FUNC fixes for CUDA 10/11/12. 2023-02-03 19:18:45 +00:00
Charles Schlosser
2a90653395 fix lapacke config 2023-02-03 16:40:08 +00:00
Rasmus Munk Larsen
3460f3558e Use VERIFY_IS_EQUAL to compare to zeros. 2023-02-01 13:49:56 -08:00
Jeremy Nimmer
13a1f25da9 Revert StlIterators edit from "Fix undefined behavior..." 2023-02-01 20:01:36 +00:00
Charles Schlosser
fd2fd48703 Update file ForwardDeclarations.h 2023-02-01 16:52:20 +00:00
Rasmus Munk Larsen
37b2e97175 Tweak special case handling in atan2. 2023-01-31 17:48:00 -08:00
Jeremy Nimmer
a1cdcdb038 Fix undefined behavior in Block access 2023-02-01 00:40:45 +00:00
Chip Kerchner
4a58f30aa0 Fix pre-POWER8_VECTOR bugs in pcmp_lt and pnegate and reactivate psqrt. 2023-01-31 19:40:24 +00:00
Rasmus Munk Larsen
12ad99ce60 Remove unused variables from GenericPacketMathFunctions.h 2023-01-29 18:10:28 +00:00
Charles Schlosser
6987a200bb Fix stupid sparse bugs with outerSize == 0 2023-01-28 02:03:09 +00:00
Charles Schlosser
0471e61b4c Optimize various mathematical packet ops 2023-01-28 01:34:26 +00:00
Charles Schlosser
1aa6dc2007 Fix sparse warnings 2023-01-27 22:47:42 +00:00
Antonio Sánchez
17ae83a966 Fix bugs exposed by enabling GPU asserts. 2023-01-27 21:43:00 +00:00
Chip Kerchner
ab8725d947 Turn off vectorize version of rsqrt - doesn't match generic version 2023-01-27 18:28:54 +00:00
Charles Schlosser
6d9f662a70 Tweak atan2 2023-01-26 17:38:21 +00:00
Chip Kerchner
6fc9de7d93 Fix slowdown in bfloat16 MMA when rows is not a multiple of 8 or columns is not a multiple of 4. 2023-01-25 18:22:20 +00:00
Charles Schlosser
6d4221af76 Revert qr tests 2023-01-23 22:23:08 +00:00
Charles Schlosser
7f58bc98b1 Refactor sparse 2023-01-23 17:55:50 +00:00
Rasmus Munk Larsen
576448572f More fixes for __GNUC_PATCHLEVEL__. 2023-01-23 17:04:24 +00:00
Rasmus Munk Larsen
164ddf75ab Use __GNUC_PATCHLEVEL__ rather than __GNUC_PATCH__, according to the documentation https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html 2023-01-23 16:56:14 +00:00
Charles Schlosser
5a7ca681d5 Fix sparse insert 2023-01-20 21:32:32 +00:00
Antonio Sánchez
08c961e837 Add custom ODR-safe assert. 2023-01-20 17:38:13 +00:00
Amir Masoud Abdol
3fe8c51104 Replace the Deprecated $<CONFIGURATION> with $<CONFIG> 2023-01-17 19:44:32 +00:00
Sean McBride
d70b4864d9 issue #2581: review and cleanup of compiler version checks 2023-01-17 18:58:34 +00:00
Mehdi Goli
b523120687 [SYCL-2020 Support] Enabling Intel DPCPP Compiler support to Eigen 2023-01-16 07:04:08 +00:00
tttapa
bae119bb7e Support per-thread is_malloc_allowed() state 2023-01-16 01:34:56 +00:00
Charles Schlosser
fa0bd2c34e improve sparse permutations 2023-01-15 03:21:25 +00:00
Antonio Sánchez
2e61c0c6b4 Add missing EIGEN_DEVICE_FUNC in a few places when called by asserts. 2023-01-15 02:06:17 +00:00
Charles Schlosser
4aca06f63a avoid move assignment in ColPivHouseholderQR 2023-01-15 01:34:10 +00:00
Charles Schlosser
68082b8226 Fix QR, again 2023-01-13 03:23:17 +00:00
Sergey Fedorov
4d05765345 Altivec fixes for Darwin: do not use unsupported VSX insns 2023-01-12 16:33:33 +00:00
Rasmus Munk Larsen
6156797016 Revert "Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings"
This reverts commit be7791e097
2023-01-11 18:50:52 +00:00
Charles Schlosser
be7791e097 Add template to specify QR permutation index type, Fix ColPivHouseholderQR Lapacke bindings 2023-01-11 15:57:28 +00:00
Charles Schlosser
9463fc95f4 change insert strategy 2023-01-11 06:24:49 +00:00
Martin Burchell
c54785b071 Fix error: unused parameter 'tmp' [-Werror,-Wunused-parameter] on clang/32-bit arm 2023-01-10 21:15:28 +00:00
Antonio Sanchez
f47472603b Add missing header for GPU tests. 2023-01-09 11:21:13 -08:00
Charles Schlosser
81172cbdcb Overhaul Sparse Core 2023-01-07 22:09:42 +00:00
Robin Miquel
9255181891 Modified spbenchsolver help message because it could be misunderstood 2023-01-07 21:35:46 +00:00
Chip Kerchner
d20fe21ae4 Improve performance for Power10 MMA bfloat16 GEMM 2023-01-06 23:08:37 +00:00
Ryan Senanayake
fe7f527787 Fix guard macros for emulated FP16 operators on GPU 2023-01-06 22:02:51 +00:00
Rasmus Munk Larsen
b8422c99cd Update file jacobisvd.cpp 2023-01-06 21:14:17 +00:00
Antonio Sánchez
262194f12c Fix a bunch of minor build and test issues. 2023-01-06 16:37:26 +00:00
Antonio Sánchez
3564668908 Fix overalign check. 2023-01-05 17:10:48 +00:00
Charles Schlosser
f3929ac7ed Fix EIGEN_HAS_CXX17_OVERALIGN for icc 2023-01-03 17:30:10 +00:00
LAI Bruce
1b33a6374b Fixes git add . doesn't include scripts/buildtests.in 2023-01-03 17:06:36 +00:00
Charles Schlosser
a8bab0d8ae Patch SparseLU 2022-12-31 04:52:36 +00:00
Antonio Sánchez
910f6f65d0 Adjust thresholds for bfloat16 product tests that are currently failing. 2022-12-28 19:32:25 +00:00
Arthur
311cc0f9cc Enable NEON pcmp, plset, and complex psqrt 2022-12-22 05:38:34 +00:00
Antonio Sánchez
dbf7ae6f9b Fix up C++ version detection macros and cmake tests. 2022-12-20 18:06:03 +00:00
Antonio Sánchez
bb6675caf7 Fix incorrect NEON native fp16 multiplication. 2022-12-19 20:46:44 +00:00
Rasmus Munk Larsen
dd85d26946 Revert "Avoid mixing types in CompressedStorage.h" 2022-12-19 20:09:37 +00:00
Arthur Feeney
c4fb6af24b Enable NEON pabs for unsigned int types 2022-12-19 17:07:36 +00:00
Rasmus Munk Larsen
400bc5cd5b Add sparse_basic_1 to smoke tests. 2022-12-16 22:03:33 +00:00
Rasmus Munk Larsen
04e4f0bb24 Add missing colon in SparseMatrix.h. 2022-12-16 21:50:00 +00:00
Rasmus Munk Larsen
3d8a8def8a Avoid mixing types in CompressedStorage.h 2022-12-16 20:11:02 +00:00
Charles Schlosser
4bb2446796 Add operators to CompressedStorageIterator 2022-12-16 16:48:50 +00:00
Rasmus Munk Larsen
e1aee4ab39 Update test of numext::signbit. 2022-12-15 19:58:59 +00:00
Rasmus Munk Larsen
3717854a21 Use numext::signbit instead of std::signbit, which is not defined for bfloat16. 2022-12-15 18:41:46 +00:00
Alexander Richardson
37de432907 Avoid using std::raise() for divide by zero 2022-12-14 20:06:16 +00:00
Alexander Richardson
62de593c40 Allow std::initializer_list constructors in constexpr expressions 2022-12-14 17:05:37 +00:00
Charles Schlosser
6d3e3678b4 optimize equalspace packetop 2022-12-13 01:22:25 +00:00
Charles Schlosser
2004831941 add EqualSpaced / setEqualSpaced 2022-12-13 00:54:57 +00:00
Melven Roehrig-Zoellner
273f803846 Add BDCSVD_LAPACKE binding 2022-12-09 18:50:12 +00:00
Antonio Sánchez
03c9b4738c Enable direct access for NestByValue. 2022-12-07 18:21:45 +00:00
Chip Kerchner
b59f18b4f7 Increase L2 and L3 cache size for Power10. 2022-12-07 18:20:33 +00:00
Antonio Sánchez
c614b2bbd3 Fix index type for sparse index sorting. 2022-12-06 00:02:31 +00:00
Charles Schlosser
44fe539150 add sparse sort inner vectors function 2022-12-01 19:28:56 +00:00
Lianhuang Li
d194167149 Fix the bug using neon instruction fmla for data type half 2022-12-01 17:28:57 +00:00
Pedro Caldeira
31ab62d347 Add support for Power10 (AltiVec) MMA instructions for bfloat16. 2022-11-30 23:33:37 +00:00
Antonio Sánchez
dcb042a87d Fix serialization for non-compressed matrices. 2022-11-30 18:16:47 +00:00
Antonio Sánchez
2260e11eb0 Fix reshape strides when input has non-zero inner stride. 2022-11-29 19:39:29 +00:00
Alexandre Hoffmann
23524ab6fc Changing BiCGSTAB parameters initialization so that it works with custom types 2022-11-29 19:37:46 +00:00
Antonio Sánchez
ab2b26fbc2 Fix sparseLU solver when destination has a non-unit stride. 2022-11-29 19:37:03 +00:00
Antonio Sánchez
551eebc8ca Add synchronize method to all devices. 2022-11-29 19:35:02 +00:00
Charles Schlosser
b7551bff92 Fix a bunch of annoying compiler warnings in tests 2022-11-21 20:07:19 +00:00
Antonio Sánchez
e7b1ad0315 Add serialization for sparse matrix and sparse vector. 2022-11-21 19:43:07 +00:00
Charles Schlosser
044f3f6234 Fix bug in handmade_aligned_realloc 2022-11-18 22:35:31 +00:00
Chris
6728683938 Small cleanup of IDRS.h 2022-11-16 13:51:23 +00:00
Charles Schlosser
02805bd56c Fix AVX2 psignbit 2022-11-16 13:43:11 +00:00
Chip Kerchner
399ce1ed63 Fix duplicate execution code for Power 8 Altivec in pstore_partial. 2022-11-16 13:41:42 +00:00
Gabriele Buondonno
6431dfdb50 Cross product for vectors of size 2. Fixes #1037 2022-11-15 22:39:42 +00:00
Antonio Sánchez
8588d8c74b Correct pnegate for floating-point zero. 2022-11-15 18:07:23 +00:00
Antonio Sanchez
5eacb9e117 Put brackets around unsigned type names. 2022-11-15 09:09:45 -08:00
Antonio Sánchez
37e40dca85 Fix ambiguity in PPC for vec_splats call. 2022-11-14 18:58:16 +00:00
Antonio Sánchez
7dc6db75d4 Fix typo in CholmodSupport 2022-11-08 23:49:56 +00:00
Charles Schlosser
9b6d624eab fix neon 2022-11-08 20:03:01 +00:00
Rasmus Munk Larsen
7e398e9436 Add missing return keyword in psignbit for NEON. 2022-11-04 16:13:09 +00:00
Charles Schlosser
82b152dbe7 Add signbit function 2022-11-04 00:31:20 +00:00
Antonio Sánchez
8f8e36458f Remove recently added sparse assert in SparseMapBase. 2022-11-03 17:29:05 +00:00
Antonio Sanchez
01a31b81b2 Remove unused parameter name. 2022-11-01 15:51:25 -07:00
Antonio Sánchez
c5b896c5a3 Allow empty matrices to be resized. 2022-10-27 20:33:35 +00:00
Antonio Sánchez
886aad1361 Disable patan for double on PPC. 2022-10-27 17:56:08 +00:00
Antonio Sánchez
ab407b2b6e Fix handmade_aligned_malloc offset computation. 2022-10-27 17:33:47 +00:00
Antonio Sánchez
adb30efb25 Add assert for invalid outerIndexPtr array in SparseMapBase. 2022-10-26 22:51:33 +00:00
Antonio Sánchez
c27d1abe46 Fix pragma check for disabling fastmath. 2022-10-26 22:50:57 +00:00
Charles Schlosser
a226371371 Change handmade_aligned_malloc/realloc/free to store a 1 byte offset instead of absolute address 2022-10-22 22:51:31 +00:00
Antonio Sánchez
bf48d46338 Explicitly state that indices must be sorted. 2022-10-19 18:15:29 +00:00
Rasmus Munk Larsen
3bb6a48d8c Fix bug atan2 2022-10-12 23:49:32 +00:00
Rasmus Munk Larsen
14c847dc0e Refactor special values test for pow, and add a similar test for atan2 2022-10-12 20:12:08 +00:00
Rasmus Munk Larsen
462758e8a3 Don't use generic sign function for sign(complex) unless it is vectorizable 2022-10-12 16:03:29 +00:00
Rasmus Munk Larsen
c0d6a72611 Use pnegate(pzero(x)) as a generic way to generate -0.0. Some compiler do not handle the literal -0.0 properly in fastmath mode. 2022-10-12 01:57:05 +00:00
Laurent Rineau
7846c7387c Eigen/Sparse: fix warnings -Wunused-but-set-variable 2022-10-11 17:37:04 +00:00
Rasmus Munk Larsen
3167544873 Handle NaN inputs to atan2. 2022-10-10 19:36:36 -07:00
Rasmus Munk Larsen
72db3f0fa5 Remove references to M_PI_2 and M_PI_4. 2022-10-11 00:27:16 +00:00
Rasmus Munk Larsen
d6bc062591 Remove reference to EIGEN_HAS_CXX11_MATH. 2022-10-10 23:38:28 +00:00
Rasmus Munk Larsen
5ceed0d57f Guard GCC-specific pragmas with "#ifdef EIGEN_COMP_GNUC" 2022-10-10 20:38:53 +00:00
Alexander Richardson
528b68674c [clang-format] Add a few macros to AttributeMacros 2022-10-10 16:44:47 +00:00
Rasmus Munk Larsen
e95c4a837f Simpler range reduction strategy for atan<float>(). 2022-10-04 18:11:00 +00:00
Antonio Sánchez
80efbfdeda Unconditionally enable CXX11 math. 2022-10-04 17:37:47 +00:00
Antonio Sánchez
e5794873cb Replace assert with eigen_assert. 2022-10-04 17:11:23 +00:00
Antonio Sánchez
7d6a9925cc Fix 4x4 inverse when compiling with -Ofast. 2022-10-04 16:05:49 +00:00
Rasmus Munk Larsen
1414a76fa9 Only vectorize atan<double> for Altivec if VSX is available. 2022-10-03 22:06:58 +00:00
Rasmus Munk Larsen
c475228b28 Vectorize atan() for double. 2022-10-01 01:49:30 +00:00
Rasmus Munk Larsen
1e1848fdb1 Add a vectorized implementation of atan2 to Eigen. 2022-09-28 20:46:49 +00:00
Rasmus Munk Larsen
b3bf8d6a13 Try to reduce size of GEBP kernel for non-ARM targets. 2022-09-28 02:37:18 +00:00
Rasmus Munk Larsen
13b69fc1b0 Try to reduce compilation time/memory for GEBP kernel using EIGEN_IF_CONSTEXPR 2022-09-23 20:09:42 +00:00
Rasmus Munk Larsen
3c4637640b Remove unused typedef. 2022-09-23 19:11:31 +00:00
Rasmus Munk Larsen
ed8cda3ce4 Move EIGEN_NEON_GEBP_NR macro to the right place in GeneralBlockPanelKernel.h 2022-09-23 02:24:27 +00:00
Rasmus Munk Larsen
e2ea866515 Add a macro to set the nr trait in the BEBP kernel for NEON. 2022-09-22 23:56:34 +00:00
Lianhuang Li
23299632c2 Use 3px8/2px8/1px8/1x8 gebp_kernel on arm64-neon 2022-09-21 16:36:40 +00:00
Rasmus Munk Larsen
7b2901e2aa Add vectorized integer division for int32 with AVX512, AVX or SSE. 2022-09-21 00:27:23 +00:00
Chao Chen
5ffe7b92e0 [ROCm] fixed gpuGetDevice unused message 2022-09-20 21:38:20 +00:00
Rasmus Munk Larsen
f913a40678 Revert "Add AVX int32_t pdiv"
This reverts commit ea84e7ad63
2022-09-16 22:48:08 +00:00
Rasmus Munk Larsen
273e0c884e Revert "Add constexpr, test for C++14 constexpr." 2022-09-16 21:14:29 +00:00
Charles Schlosser
ea84e7ad63 Add AVX int32_t pdiv 2022-09-16 17:06:29 +00:00
Rasmus Munk Larsen
dceb779ecd Fix test for pow with mixed integer types. We do not convert the exponent if it is an integer type. 2022-09-12 15:51:27 -07:00
Rasmus Munk Larsen
afc014f1b5 Allow mixed types for pow(), as long as the exponent is exactly representable in the base type. 2022-09-12 21:55:30 +00:00
Antonio Sánchez
b2c82a9347 Remove bad skew_symmetric_matrix3 test. 2022-09-10 07:08:37 +00:00
Rasmus Munk Larsen
e8a2aa24a2 Fix a couple of issues with unary pow(): 2022-09-09 17:21:11 +00:00
Rohit Santhanam
07d0759951 [ROCm] Fix for sparse matrix related breakage on ROCm. 2022-09-09 14:41:00 +00:00
Antonio Sánchez
fb212c745d Fix g++-6 constexpr and c++20 constexpr build errors. 2022-09-09 03:41:45 +00:00
Thomas Gloor
ec9c7163a3 Feature/skew symmetric matrix3 2022-09-08 20:44:40 +00:00
Antonio Sánchez
311ba66f7c Fix realloc for non-trivial types. 2022-09-08 19:39:36 +00:00
Antonio Sánchez
3c37dd2a1d Tweak bound for pow to account for floating-point types. 2022-09-08 17:40:45 +00:00
Rasmus Munk Larsen
f9dfda28ab Add missing comparison operators for GPU packets. 2022-09-07 21:13:45 +00:00
Rasmus Munk Larsen
242325eca7 Remove unused variable. 2022-09-07 20:46:44 +00:00
Tobias Schlüter
133498c329 Add constexpr, test for C++14 constexpr. 2022-09-07 03:42:34 +00:00
Antonio Sánchez
69f50e3a67 Adjust overflow threshold bound for pow tests. 2022-09-06 19:53:29 +00:00
Antonio Sanchez
3e44f960ed Reduce compiler warnings for tests. 2022-09-06 18:20:56 +00:00
Florian Richer
b7e21d4e38 Call check_that_malloc_is_allowed() in aligned_realloc() 2022-09-06 18:00:37 +00:00
Gilles Aouizerate
6e83e906c2 fix typo in doc/TutorialSparse.dox 2022-09-06 17:56:13 +00:00
Michael Palomas
525f066671 fixed msvc compilation error in GeneralizedEigenSolver.h 2022-09-04 17:50:43 +00:00
Antonio Sánchez
f241a2c18a Add asserts for index-out-of-bounds in IndexedView. 2022-09-02 17:28:03 +00:00
Antonio Sánchez
f5364331eb Fix some cmake issues. 2022-09-02 16:43:14 +00:00
Antonio Sánchez
d816044b6e Fix mixingtypes tests. 2022-09-02 15:30:13 +00:00
Gilles Aouizerate
94cc83faa1 2 typos fix in the 3rd table. 2022-08-31 19:54:42 +00:00
Antonio Sánchez
30c42222a6 Fix some test build errors in new unary pow. 2022-08-30 17:24:14 +00:00
Rasmus Munk Larsen
bd393e15c3 Vectorize acos, asin, and atan for float. 2022-08-29 19:49:33 +00:00
Charles Schlosser
e5af9f87f2 Vectorize pow for integer base / exponent types 2022-08-29 19:23:54 +00:00
chuckyschluz
8acbf5c11c re-enable pow for complex types 2022-08-26 17:29:02 -04:00
Rasmus Munk Larsen
7064ed1345 Specialize psign<Packet8i> for AVX2, don't vectorize psign<bool>. 2022-08-26 17:02:37 +00:00
Rasmus Munk Larsen
98e51c9e24 Avoid undefined behavior in array_cwise test due to signed integer overflow 2022-08-26 16:19:03 +00:00
Arthur
a7c1cac18b Fix GeneralizedEigenSolver::info() and Asserts 2022-08-25 22:05:04 +00:00
Antonio Sanchez
714678fc6c Add missing ptr in realloc call. 2022-08-24 22:04:04 -07:00
Charles Schlosser
b2a13c9dd1 Sparse Core: Replace malloc/free with conditional_aligned 2022-08-23 21:44:22 +00:00
Rasmus Munk Larsen
6aad0f821b Fix psign for unsigned integer types, such as bool. 2022-08-22 20:19:35 +00:00
Rasmus Munk Larsen
1a09defce7 Protect new pblend implementation with EIGEN_VECTORIZE_AVX2 2022-08-22 18:28:03 +00:00
Rasmus Munk Larsen
7c67dc67ae Use proper double word division algorithm for pow<double>. Gives 11-15% speedup. 2022-08-17 18:36:23 +00:00
Matthew Sterrett
7a3b667c43 Add support for AVX512-FP16 for vectorizing half precision math 2022-08-17 18:15:21 +00:00
Charles Schlosser
76a669fb45 add fixed power unary operation 2022-08-16 21:32:36 +00:00
Matthew Sterrett
39fcc89798 Removed unnecessary checks for FP16C 2022-08-16 18:14:41 +00:00
Romain Biessy
2f7cce2dd5 [SYCL] Fix some SYCL tests 2022-08-16 17:37:54 +00:00
Arthur
27367017bd Disable bad "deprecated warning" edge-case in BDCSVD 2022-08-11 18:43:31 +00:00
Antonio Sánchez
b8e93bf589 Eliminate bool bitwise warnings. 2022-08-09 22:42:30 +00:00
Lexi Bromfield
66ea0c09fd Don't double-define Half functions on aarch64 2022-08-09 20:00:34 +00:00
Rasmus Munk Larsen
97e0784dc6 Vectorize the sign operator in Eigen. 2022-08-09 19:54:57 +00:00
Arthur
be20207d10 Fix vectorized Jacobi Rotation 2022-08-08 19:29:56 +00:00
Rasmus Munk Larsen
7a87ed1b6a Fix code and unit test for a few corner cases in vectorized pow() 2022-08-08 18:48:36 +00:00
Chip Kerchner
9e0afe0f02 Fix non-VSX PowerPC build 2022-08-08 18:18:17 +00:00
Chip Kerchner
84a9d6fac9 Fix use of Packet2d type for non-VSX. 2022-08-03 20:48:13 +00:00
Chip Kerchner
ce60a7be83 Partial Packet support for GEMM real-only (PowerPC). Also fix compilation warnings & errors for some conditions in new API. 2022-08-03 18:15:19 +00:00
Antonio Sánchez
5a1c7807e6 Fix inner iterator for sparse block. 2022-08-03 17:26:12 +00:00
Antonio Sánchez
39d22ef46b Fix flaky packetmath_1 test. 2022-08-02 17:42:45 +00:00
Antonio Sánchez
7896c7dc6b Use numext::sqrt in ConjugateGradient. 2022-07-29 20:17:23 +00:00
Ilya Tokar
e618c4a5e9 Improve pblend AVX implementation 2022-07-29 18:45:33 +00:00
sjusju
ef4654bae7 Add true determinant to QR and it's variants 2022-07-29 18:24:14 +00:00
Alexander Richardson
b7668c0371 Avoid including <sstream> with EIGEN_NO_IO 2022-07-29 18:02:51 +00:00
John Mather
7dd3dda3da Updated AccelerateSupport documentation after PR 966. 2022-07-29 17:42:31 +00:00
Julian Kent
69714ff613 Add Sparse Subset of Matrix Inverse 2022-07-28 18:04:35 +00:00
Antonio Sánchez
34780d8bd1 Include immintrin.h header for enscripten. 2022-07-22 02:27:42 +00:00
Antonio Sánchez
2cf4d18c9c Disable AVX512 GEMM kernels by default. 2022-07-20 21:22:48 +00:00
Charles Schlosser
a678a3e052 Fix aligned_realloc to call check_that_malloc_is_allowed() if ptr == 0 2022-07-19 20:59:07 +00:00
b-shi
4a56359406 Add option to disable avx512 GEBP kernels 2022-07-18 17:59:09 +00:00
Mathieu Westphal
1092574b26 Fix wrong doxygen group usage 2022-07-12 13:22:46 +02:00
Antonio Sánchez
e1165dbf9a AutoDiff depends on Core, so include appropriate header. 2022-07-09 23:57:09 +00:00
Antonio Sánchez
bb51d9f4fa Fix ODR violations. 2022-07-09 04:56:36 +00:00
Rohit Santhanam
06a458a13d Enable subtests which use device side malloc since this has been fixed in ROCm 5.2. 2022-06-29 17:09:43 +00:00
Chip Kerchner
84cf3ff18d Add pload_partial, pstore_partial (and unaligned versions), pgather_partial, pscatter_partial, loadPacketPartial and storePacketPartial. 2022-06-27 19:18:00 +00:00
Chip Kerchner
c603275dc9 Better performance for Power10 using more load and store vector pairs for GEMV 2022-06-27 18:11:55 +00:00
Antonio Sanchez
0e18714167 Fix clang-tidy warnings about function definitions in headers. 2022-06-24 15:10:58 +00:00
Antonio Sánchez
8ed3b9dcd6 Skip f16/bf16 bessel specializations on AVX512 if unavailable. 2022-06-24 15:10:36 +00:00
Antonio Sánchez
bc2ab81634 Eliminate undef warnings when not compiling for AVX512. 2022-06-24 15:10:10 +00:00
Antonio Sánchez
0e083b172e Use numext::sqrt in Householder.h. 2022-06-21 16:29:59 +00:00
b-shi
37673ca1bc AVX512 TRSM kernels use alloca if EIGEN_NO_MALLOC requested 2022-06-17 18:05:26 +00:00
Chip Kerchner
4d1c16eab8 Fix tanh and erf to use vectorized version for EIGEN_FAST_MATH in VSX. 2022-06-15 16:06:43 +00:00
Mehdi Goli
7ea823e824 [SYCL-Spec] According to [SYCL-2020 spec](... 2022-06-13 15:52:29 +00:00
Arthur
ba4d7304e2 Document DiagonalBase 2022-06-08 17:46:32 +00:00
Binhao Qin
95463b59bc Mark index_remap as EIGEN_DEVICE_FUNC in src/Core/Reshaped.h (Fixes #2493) 2022-06-07 20:10:47 +00:00
Shi, Brian
28812d2ebb AVX512 TRSM Kernels respect EIGEN_NO_MALLOC 2022-06-07 11:28:42 -07:00
sfalmo
9960a30422 Fix row vs column vector typo in Matrix class tutorial 2022-06-07 17:28:19 +00:00
Antonio Sánchez
8c2e0e3cb8 Fix ambiguous comparisons for c++20 (again again) 2022-06-07 17:06:17 +00:00
Arthur
14aae29470 Provide DiagonalMatrix Product and Initializers 2022-06-06 21:43:22 +00:00
Antonio Sánchez
76cf6204f3 Revert "Fix c++20 ambiguity of comparisons."
This reverts commit 4f6354128f
2022-06-04 02:32:10 +00:00
aaraujom
8fbb76a043 Fix build issues with MSVC for AVX512 2022-06-03 14:55:40 +00:00
Antonio Sánchez
4f6354128f Fix c++20 ambiguity of comparisons. 2022-06-03 05:11:07 +00:00
Oleg Shirokobrod
f542b0a71f Adding an MKL adapter in FFT module. 2022-06-02 18:10:43 +00:00
aaraujom
d49ede4dc4 Add AVX512 s/dgemm optimizations for compute kernel (2nd try) 2022-05-28 02:00:21 +00:00
Rasmus Munk Larsen
510f6b9f15 Fix integer shortening warnings in visitor tests. 2022-05-27 18:51:37 +00:00
Arthur
705ae70646 Add R-Bidiagonalization step to BDCSVD 2022-05-27 02:00:24 +00:00
Mario Rincon-Nigro
e99163e732 fix: issue 2481: LDLT produce wrong results with AutoDiffScalar 2022-05-25 15:26:10 +00:00
Antonio Sánchez
477eb7f630 Revert "Avoid ambiguous Tensor comparison operators for C++20 compatibility"
This reverts commit 5c2179b6c3
2022-05-24 16:09:59 +00:00
Mehdi Goli
c5a5ac680c [SYCL] SYCL-2020 range does not have default constructor. 2022-05-24 03:11:46 +00:00
Benjamin Kramer
5c2179b6c3 Avoid ambiguous Tensor comparison operators for C++20 compatibility 2022-05-23 17:36:03 +00:00
Chip Kerchner
aa8b7e2c37 Add subMappers to Power GEMM packing - simplifies the address calculations (10% faster) 2022-05-23 15:18:29 +00:00
Antonio Sánchez
32348091ba Avoid signed integer overflow in adjoint test. 2022-05-23 14:46:16 +00:00
Mehdi Goli
cbe03f3531 [SYCL] Extending SYCL queue interface extension. 2022-05-23 14:45:27 +00:00
Guoqiang QI
32a3f9ac33 Improve plogical_shift_* implementations and fix typo in SVE/PacketMath.h 2022-05-23 09:33:49 +00:00
Eisuke Kawashima
ac5c83a3f5 unset executable flag 2022-05-22 22:47:43 +09:00
Antonio Sanchez
481a4a8c31 Fix BDCSVD condition for failing with numerical issue. 2022-05-20 08:18:31 -07:00
Tobias Wood
a9868bd5be Add arg() to tensor 2022-05-20 03:33:01 +00:00
Antonio Sánchez
028ab12586 Prevent BDCSVD crash caused by index out of bounds. 2022-05-19 22:29:48 +00:00
Rohan Ghige
798fc1c577 Fix 'Incorrect reference code in STL_interface.hh for ata_product' eigen/isses/2425 2022-05-18 14:42:57 +00:00
Antonio Sánchez
9b9496ad98 Revert "Add AVX512 optimizations for matrix multiply"
This reverts commit 25db0b4a82
2022-05-13 18:50:33 +00:00
aaraujom
25db0b4a82 Add AVX512 optimizations for matrix multiply 2022-05-12 23:41:19 +00:00
Guoqiang QI
00b75375e7 Adding PocketFFT support in FFT module since kissfft has some flaw in accuracy and performance 2022-05-11 17:44:22 +00:00
Rasmus Munk Larsen
73d65dbc43 Update README.md. Remove obsolete comment about RowMajor not being fully supported. 2022-05-06 18:19:35 +00:00
Francesco Romano
68e03ab240 Add uninstall target only if not already defined. 2022-05-05 17:43:08 +00:00
Alex_M
2c055f8633 make diagonal matrix cols() and rows() methods constexpr 2022-05-03 10:13:37 +02:00
Chip Kerchner
c2f15edc43 Add load vector_pairs for RHS of GEMM MMA. Improved predux GEMV. 2022-04-25 16:23:01 +00:00
John Mather
9e026e5e28 Removed need to supply the Symmetric flag to UpLo argument for Accelerate LLT and LDLT 2022-04-21 20:02:10 +00:00
Chip Kerchner
44ba7a0da3 Fix compiler bugs for GCC 10 & 11 for Power GEMM 2022-04-20 15:59:00 +00:00
Chip Kerchner
b02c384ef4 Add fused multiply functions for PowerPC - pmsub, pnmadd and pnmsub 2022-04-18 16:16:32 +00:00
Rohit Santhanam
3de96caeaa Fix HouseholderSequence.h 2022-04-17 02:46:56 +00:00
Antonio Sánchez
f845a8bb1a Fix cwise NaN propagation for scalar input. 2022-04-16 05:07:44 +00:00
Charles Schlosser
a4bb513b99 Update HouseholderSequence.h 2022-04-15 16:56:17 +00:00
Shi, Brian
fc1d888415 Remove AVX512VL dependency in trsm 2022-04-14 12:44:24 -07:00
Antonio Sánchez
07db964bde Restrict new AVX512 trsm to AVX512VL, rename files for consistency. 2022-04-14 16:58:32 +00:00
Charles Schlosser
67eeba6e72 Avoidable heap allocation in applyHouseholderToTheLeft 2022-04-13 18:45:36 +00:00
Antonio Sánchez
3342fc7e4d Allow all tests to pass with EIGEN_TEST_NO_EXPLICIT_VECTORIZATION 2022-04-12 14:48:22 +00:00
Antonio Sánchez
efb08e0bb5 Revert "Fix ambiguous DiagonalMatrix constructors."
This reverts commit a81bba962a
2022-04-12 03:54:31 +00:00
Chip Kerchner
53eec53d2a Fix Power GEMV order of operations in predux for MMA. 2022-04-11 21:29:05 +00:00
Antonio Sánchez
a81bba962a Fix ambiguous DiagonalMatrix constructors. 2022-04-11 19:13:25 +00:00
Antonio Sánchez
f7b31f864c Revert "Replace call to FixedDimensions() with a singleton instance of"
This reverts commit 19e6496ce0
2022-04-10 15:30:33 +00:00
Tobias Schlüter
f3ba220c5d Remove EIGEN_EMPTY_STRUCT_CTOR 2022-04-08 18:27:26 +00:00
Antonio Sánchez
5ed7a86ae9 Fix MSVC+CUDA issues. 2022-04-08 18:05:32 +00:00
Antonio Sánchez
734ed1efa6 Fix ODR issues in lapacke_helpers. 2022-04-08 15:31:30 +00:00
Antonio Sánchez
2c45a3846e Fix some max size expressions. 2022-04-06 22:19:57 +00:00
Antonio Sánchez
edc822666d Fix navbar scroll with toc. 2022-04-05 20:14:22 +00:00
Erik Schultheis
df87d40e34 constexpr reshape helper 2022-04-05 17:32:17 +00:00
Chip Kerchner
403fa33409 Performance improvements in GEMM for Power 2022-04-05 12:18:53 +00:00
Erik Schultheis
e1df3636b2 More constexpr helpers 2022-04-04 18:38:34 +00:00
Erik Schultheis
64909b82bd static const class members turned into constexpr 2022-04-04 17:33:33 +00:00
William Talbot
2c0ef43b48 Added Scaling function overload for vector rvalue reference 2022-04-04 16:50:09 +00:00
Antonio Sanchez
ba2cb835aa Add back std::remove* aliases - third-party libraries rely on these. 2022-04-01 17:02:52 +00:00
Antonio Sánchez
0c859cf35d Consider inf/nan in scalar test_isApprox. 2022-04-01 17:00:24 +00:00
Erik Schultheis
1ddd3e29cb fixed order of arguments in blas syrk 2022-03-30 21:45:13 +00:00
Antonio Sánchez
2c56442805 Don't include .cpp in lapack. 2022-03-30 21:41:56 +00:00
Antonio Sánchez
73b2c13bf2 Disable f16c scalar conversions for MSVC. 2022-03-30 18:35:32 +00:00
Antonio Sanchez
9bc9992dd3 Eliminate trace unused warning. 2022-03-29 22:04:50 +00:00
Tobias Schlüter
e22d58e816 Add is_constant_evaluated, update alignment checks 2022-03-25 04:00:58 +00:00
Everton Constantino
f0a91838aa Enable Aarch64 CI 2022-03-24 19:50:49 +00:00
Erik Schultheis
b9d2900e8f added a missing typename and fixed a unused typedef warning 2022-03-24 12:07:18 +02:00
b-shi
0611f7fff0 Add missing explicit reinterprets 2022-03-23 21:10:26 +00:00
Essex Edwards
cd3c81c3bc Add a NNLS solver to unsupported - issue #655 2022-03-23 20:20:44 +00:00
Chip Kerchner
0699fa06fe Split general_matrix_vector_product interface for Power into two macros - one ColMajor and RowMajor. 2022-03-23 18:09:33 +00:00
Antonio Sánchez
19a6a827c4 Optimize visitor traversal in case of RowMajor. 2022-03-23 15:27:57 +00:00
Romain Biessy
f2a3e03e9b Fix usages of wrong namespace 2022-03-21 15:07:53 +00:00
Antonio Sánchez
4451823fb4 Fix ODR violation in trsm. 2022-03-20 15:56:53 +00:00
Antonio Sánchez
9a14d91a99 Fix AVX512 builds with MSVC. 2022-03-18 16:04:53 +00:00
Chip Kerchner
7b10795e39 Change EIGEN_ALTIVEC_ENABLE_MMA_DYNAMIC_DISPATCH and EIGEN_ALTIVEC_DISABLE_MMA flags to be like TensorFlow's... 2022-03-17 22:35:27 +00:00
Antonio Sánchez
3ca1228d45 Work around MSVC compiler bug dropping const. 2022-03-17 20:50:26 +00:00
Tobias Schlüter
40eb34bc5d Fix RowMajorBit <-> RowMajor mixup. 2022-03-17 15:28:12 +00:00
Øystein Sørensen
c062983464 Completed a missing parenthesis in tutorial. 2022-03-17 14:52:07 +00:00
Antonio Sánchez
9deaa19121 Work around g++-10 docker issue for geo_orthomethods_4. 2022-03-16 21:46:04 +00:00
Antonio Sanchez
e34db1239d Fix missing pound 2022-03-16 12:26:12 -07:00
Antonio Sánchez
591906477b Fix up PowerPC MMA flags so it builds by default. 2022-03-16 19:16:28 +00:00
b-shi
518fc321cb AVX512 Optimizations for Triangular Solve 2022-03-16 18:04:50 +00:00
Antonio Sánchez
01b5bc48cc Disable schur non-convergence test. 2022-03-16 17:33:53 +00:00
Erik Schultheis
421cbf0866 Replace Eigen type metaprogramming with corresponding std types and make use of alias templates 2022-03-16 16:43:40 +00:00
Arthur
514f90c9ff Remove workarounds for bad GCC-4 warnings 2022-03-16 00:08:16 +00:00
Rasmus Munk Larsen
9ad5661482 Revert "Fix up PowerPC MMA flags so it builds by default." 2022-03-15 20:51:03 +00:00
Antonio Sánchez
65eeedf964 Fix up PowerPC MMA flags so it builds by default. 2022-03-15 20:22:23 +00:00
Tobias Schlüter
cb1e8228e9 Convert bit calculation to constexpr, avoid casts. 2022-03-13 22:38:36 +09:00
Antonio Sánchez
baf9a985ec Fix swap test for size 1 inputs. 2022-03-10 15:05:58 +00:00
Everton Constantino
7882408856 Temporarily disable aarch64 CI. 2022-03-10 09:34:19 -03:00
Rohit Santhanam
2a6be5492f Fix construct_at compilation breakage on ROCm. 2022-03-09 16:47:53 +00:00
Duncan McBain
a3b64625e3 Remove ComputeCpp-specific code from SYCL Vptr 2022-03-08 22:44:18 +00:00
Antonio Sánchez
9296bb4b93 Fix edge-case in zeta for large inputs. 2022-03-08 21:21:20 +00:00
Tobias Schlüter
cd2ba9d03e Add construct_at, destroy_at wrappers. Use throughout. 2022-03-08 20:43:22 +00:00
AlexanderMueller
dfa5176780 make SparseSolverBase and IterativeSolverBase move constructable 2022-03-08 20:03:53 +01:00
Tobias Schlüter
9883108f3a Remove copy_bool workaround for gcc 4.3 2022-03-08 17:43:11 +00:00
John Mather
3a9d404d76 Add support for Apple's Accelerate sparse matrix solvers 2022-03-08 00:09:18 +00:00
Antonio Sánchez
008ff3483a Fix broken tensor executor test, allow tensor packets of size 1. 2022-03-07 20:30:37 +00:00
Antonio Sanchez
28c7c1a629 Log position of first difference for easier debugging. 2022-03-07 19:06:27 +00:00
Antonio Sánchez
cf82186416 Adds new CMake Options for controlling build components. 2022-03-05 05:49:45 +00:00
Antonio Sánchez
b2ee235a4b Split and reduce SVD test sizes. 2022-03-05 00:15:28 +00:00
Antonio Sánchez
0ae94456a0 Remove duplicate IsRowMajor declaration. 2022-03-04 21:22:02 +00:00
Rasmus Munk Larsen
0e6f4e43f1 Fix a few confusing comments in psincos_float. 2022-03-04 20:41:49 +00:00
Sean McBride
f1b9692d63 Removed EIGEN_UNUSED decorations from many functions that are in fact used 2022-03-03 20:19:33 +00:00
Antonio Sánchez
27d8f29be3 Update vectorization_logic tests for all platforms. 2022-03-03 19:54:15 +00:00
Arthur
c9ff739af1 Fix JacobiSVD_LAPACKE bindings 2022-03-03 19:24:07 +00:00
Zhuo Zhang
d0b1aef6f6 Speed lscg by using .noalias 2022-03-03 08:52:09 +00:00
Antonio Sanchez
55c7400db5 Fix enum conversion warnings in BooleanRedux. 2022-03-03 04:44:20 +00:00
Antonio Sánchez
711803c427 Skip denormal test if Cond is false. 2022-03-03 04:32:13 +00:00
Antonio Sánchez
d819a33bf6 Remove poor non-convergence checks in NonLinearOptimization. 2022-03-02 19:31:20 +00:00
Antonio Sánchez
9c07e201ff Modified sqrt/rsqrt for denormal handling. 2022-03-02 17:20:47 +00:00
Antonio Sanchez
1c2690ed24 Adjust tolerance of matrix_power test for MSVC. 2022-03-01 23:33:05 +00:00
Antonio Sánchez
b48922cb5c Fix SVD for MSVC+CUDA. 2022-03-01 21:35:22 +00:00
Yury Gitman
bf6726a0c6 Fix any/all reduction in the case of row-major layout 2022-03-01 05:27:50 +00:00
Antonio Sánchez
f03df0df53 Fix SVD for MSVC. 2022-02-28 19:53:15 +00:00
Antonio Sánchez
19c39bea29 Fix mixingtypes for g++-11. 2022-02-25 19:28:10 +00:00
Antonio Sánchez
2ed4bee78f Fix frexp packetmath tests for MSVC. 2022-02-24 22:16:37 +00:00
Antonio Sánchez
d58e629130 Disable deprecated warnings for SVD tests on MSVC. 2022-02-24 21:20:49 +00:00
Antonio Sánchez
3d7e2d0e3e Fix packetmath compilation error. 2022-02-23 23:27:08 +00:00
Antonio Sánchez
8970719771 Fix gcc-5 packetmath_12 bug. 2022-02-23 21:56:25 +00:00
Antonio Sánchez
f0b81fefb7 Disable deprecated warnings in SVD tests. 2022-02-23 18:32:00 +00:00
Rasmus Munk Larsen
8b875dbef1 Changes to fast SQRT/RSQRT 2022-02-23 17:32:21 +00:00
Ramil Sattarov
f9b7564faa E2K: initial support of LCC MCST compiler for the Elbrus 2000 CPU architecture 2022-02-23 17:07:34 +00:00
Antonio Sánchez
ae86a146b1 Modify test expression to avoid numerical differences (#2402). 2022-02-23 16:37:03 +00:00
Arthur
cd80e04ab7 Add assert for edge case if Thin U Requested at runtime 2022-02-23 05:35:19 +00:00
Lingzhu Xiang
35727928ad Fix test macro conflicts with STL headers in C++20 2022-02-23 07:43:30 +08:00
Romain Biessy
2dd879d4b0 [SYCL] Fix CMake for SYCL support 2022-02-22 16:53:27 +00:00
Martin Heistermann
550af3938c Fix for crash bug in SPQRSupport: Initialize pointers to nullptr to avoid free() calls of invalid pointers. 2022-02-18 16:13:28 +00:00
Antonio Sánchez
58a90c7463 Use fixed-sized U/V for fixed-sized inputs. 2022-02-16 18:31:47 +00:00
Antonio Sánchez
c367ed26a8 Make FixedInt constexpr, fix ODR of fix<N> 2022-02-16 17:47:51 +00:00
Antonio Sánchez
766087329e Re-add svd::compute(Matrix, options) method to avoid breaking external projects. 2022-02-16 00:54:02 +00:00
Antonio Sánchez
a58af20d61 Add descriptions to Matrix typedefs. 2022-02-15 21:53:27 +00:00
Antonio Sánchez
28e008b99a Fix sqrt/rsqrt for NEON. 2022-02-15 21:31:51 +00:00
Antonio Sanchez
23755030c9 Fix MSVC+NVCC 9.2 pragma error. 2022-02-15 10:51:32 -08:00
Erik Schultheis
7197b577fb Remove unused macros in AVX packetmath.
The following macros are removed:

* EIGEN_DECLARE_CONST_Packet8f
* EIGEN_DECLARE_CONST_Packet4d
* EIGEN_DECLARE_CONST_Packet8f_FROM_INT
* EIGEN_DECLARE_CONST_Packet8i
2022-02-14 10:34:23 +00:00
Antonio Sanchez
bded5028a5 Fix ODR failures in TensorRandom. 2022-02-11 23:28:33 -08:00
Rasmus Munk Larsen
18eab8f997 Add convenience method constexpr std::size_t size() const to Eigen::IndexList 2022-02-12 04:23:03 +00:00
Florian Maurin
fbc62f7df9 Complete doc with MatrixXNt and MatrixNXt 2022-02-11 21:55:54 +00:00
Chip Kerchner
cb5ca1c901 Cleanup compiler warnings, etc from recent changes in GEMM & GEMV for PowerPC 2022-02-09 18:47:08 +00:00
Matt Keeter
cec0005c74 Return alphas() and betas() by const reference 2022-02-08 23:16:10 +00:00
Rasmus Munk Larsen
92d0026b7b Provide a definition for numeric_limits static data members 2022-02-08 20:34:53 +00:00
Björn Ingvar Dahlgren
b94bddcde0 Typo in COD's doc: matrixR() -> matrixT() 2022-02-07 18:30:25 +00:00
Antonio Sánchez
94bed2b80c Fix collision with resolve.h. 2022-02-07 18:17:42 +00:00
Antonio Sanchez
b88de3f24f Update MPL2 with https. 2022-02-07 17:30:31 +00:00
Antonio Sánchez
9441d94dcc Revert "Make fixed-size Matrix and Array trivially copyable after C++20"
This reverts commit 47eac21072
2022-02-05 04:40:29 +00:00
Rasmus Munk Larsen
979fdd58a4 Add generic fast psqrt and prsqrt impls and make them correct for 0, +Inf, NaN, and negative arguments. 2022-02-05 00:20:13 +00:00
Antonio Sánchez
4bffbe84f9 Restrict GCC<6.3 maxpd workaround to only gcc. 2022-02-04 22:47:34 +00:00
Antonio Sánchez
e7f4a901ee Define EIGEN_HAS_AVX512_MATH in PacketMath. 2022-02-04 22:25:52 +00:00
Antonio Sánchez
6b60bd6754 Fix 32-bit arm int issue. 2022-02-04 21:59:33 +00:00
Antonio Sánchez
96da541cba Fix AVX512 math function consistency, enable for ICC. 2022-02-04 19:35:18 +00:00
Antonio Sánchez
cafeadffef Fix ODR violations. 2022-02-04 19:01:07 +00:00
Arthur
18b50458b6 Update SVD Module with Options template parameter 2022-02-02 00:15:44 +00:00
Erik Schultheis
89c6ab2385 removed some documentation referencing c++98 behaviour 2022-01-30 12:02:18 +00:00
Chip Kerchner
66464bd2a8 Fix number of block columns to NOT overflow the cache (PowerPC) abnormally in GEMV 2022-01-27 20:35:53 +00:00
Rasmus Munk Larsen
7db0ac977a Remove extraneous ")". 2022-01-27 02:20:03 +00:00
Rasmus Munk Larsen
09c0085a57 Only test pmsub, pnmadd, and pnmsub on signed types. 2022-01-27 02:09:25 +00:00
Rasmus Munk Larsen
8f2c6f0aa6 Make preciprocal IEEE compliant w.r.t. 1/0 and 1/inf. 2022-01-26 20:38:05 +00:00
Erik Schultheis
d271a7d545 reduce float warnings (comparisons and implicit conversions) 2022-01-26 18:16:19 +00:00
Rasmus Munk Larsen
51311ec651 Remove inline assembly for FMA (AVX) and add remaining extensions as packet ops: pmsub, pnmadd, and pnmsub. 2022-01-26 04:25:41 +00:00
Erik Schultheis
4e629b3c1b make casts explicit and fixed the type 2022-01-24 18:19:21 +00:00
Rasmus Munk Larsen
ea2c02060c Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512. 2022-01-21 23:49:18 +00:00
Arthur Feeney
4b0926f99b Prevent heap allocation in diagonal product 2022-01-21 21:15:44 +00:00
Ilya Tokar
a0fc640c18 Add support for packets of int64 on x86 2022-01-21 19:55:23 +00:00
Erik Schultheis
970640519b Cleanup 2022-01-21 01:48:59 +00:00
Stephen Pierce
81c928ba55 Silence some MSVC warnings 2022-01-21 00:29:23 +00:00
Sean McBride
c454b8c813 Improve clang warning suppressions by checking if warning is supported 2022-01-21 00:27:43 +00:00
David Gao
fb05198bdd Port EIGEN_OPTIMIZATION_BARRIER to soft float arm 2022-01-20 00:44:17 +00:00
arthurfeeney
937c3d73cb Fix implicit conversion warning in GEBP kernel's packing 2022-01-17 17:00:59 -06:00
Essex Edwards
49a8a1e07a Minor correction/clarification to LSCG solver documentation 2022-01-14 19:48:54 +00:00
Arthur
5fe0115724 Update comment referencing removed macro EIGEN_SIZE_MIN_PREFER_DYNAMIC 2022-01-14 19:29:47 +00:00
Arthur
ff4dffc34d fix implicit conversion warning in vectorwise_reverse_inplace 2022-01-13 20:30:54 +00:00
Chip Kerchner
708fd6d136 Add MMA and performance improvements for VSX in GEMV for PowerPC. 2022-01-13 13:23:18 +00:00
Jörg Buchwald
d1bf056394 fix compilation issue with gcc < 10 and -std=c++2a 2022-01-13 01:24:20 +01:00
Rasmus Munk Larsen
a30ecb7221 Don't use the fast implementation if EIGEN_GPU_CC, since integer_packet is not defined for float4 used by the GPU compiler (even on host). 2022-01-12 20:16:16 +00:00
Erik Schultheis
5a0a165c09 fix broken asserts 2022-01-12 18:31:53 +00:00
Rasmus Munk Larsen
0b58738938 Fix two corner cases in the new implementation of logistic sigmoid. 2022-01-12 00:41:29 +00:00
Fabrice Fontaine
5d7ffe2ca9 drop COPYING.GPL 2022-01-10 23:14:33 +00:00
Matthias Möller
a32e6a4047 Explicit type casting 2022-01-10 22:06:43 +00:00
Kolja Brix
8d81a2339c Reduce usage of reserved names 2022-01-10 20:53:29 +00:00
Essex Edwards
c61b3cb0db Fix IterativeSolverBase referring to itself as ConjugateGradient 2022-01-08 08:25:15 +00:00
Rasmus Munk Larsen
80ccacc717 Fix accuracy of logistic sigmoid 2022-01-08 00:15:14 +00:00
Rasmus Munk Larsen
8b8125c574 Make sure the scalar and vectorized path for array.exp() return consistent values. 2022-01-07 23:31:35 +00:00
Matthias Möller
c9df98b071 Fix Gcc8.5 warning about missing base class initialisation (#2404) 2022-01-07 19:16:53 +00:00
Lingzhu Xiang
47eac21072 Make fixed-size Matrix and Array trivially copyable after C++20
Making them trivially copyable allows using std::memcpy() without undefined
behaviors.

Only Matrix and Array with trivially copyable DenseStorage are marked as
trivially copyable with an additional type trait.

As described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0848r3.html
it requires extremely verbose SFINAE to make the special member functions of
fixed-size Matrix and Array trivial, unless C++20 concepts are available to
simplify the selection of trivial special member functions given template
parameters. Therefore only make this feature available to compilers that support
C++20 P0848R3.

Fix #1855.
2022-01-07 19:04:35 +00:00
Matthias Möller
c4b1dd2f6b Add support for Cray, Fujitsu, and Intel ICX compilers
The following preprocessor macros are added:

- EIGEN_COMP_CPE and EIGEN_COMP_CLANGCPE version number of the CRAY compiler if
  Eigen is compiled with the Cray C++ compiler, 0 otherwise.

- EIGEN_COMP_FCC and EIGEN_COMP_CLANGFCC version number of the FCC compiler if
  Eigen is compiled with the Fujitsu C++ compiler, 0 otherwise

- EIGEN_COMP_CLANGICC version number of the ICX compiler if Eigen is compiled
  with the Intel oneAPI C++ compiler, 0 otherwise

All three compilers (Cray, Fujitsu, Intel) offer a traditional and a Clang-based
frontend. This is distinguished by the CLANG prefix.
2022-01-07 18:46:16 +00:00
Rasmus Munk Larsen
96dc37a03b Some fixes/cleanups for numeric_limits & fix for related bug in psqrt 2022-01-07 01:10:17 +00:00
Fabian Keßler
ed27d988c1 Fixes #i2411 2022-01-06 20:02:37 +00:00
Rasmus Munk Larsen
7b5a8b6bc5 Improve plog: 20% speedup for float + handle denormals 2022-01-05 23:40:31 +00:00
Andrew Johnson
a491c7f898 Allow specifying inner & outer stride for CWiseUnaryView - fixes #2398 2022-01-05 19:24:46 +00:00
Rohit Santhanam
27a78e4f96 Some serialization API changes were made in commit... 2022-01-05 16:18:45 +00:00
Erik Schultheis
9210e71fb3 ensure that eigen::internal::size is not found by ADL, rename to ssize and... 2022-01-05 00:46:09 +00:00
Lingzhu Xiang
7244a74ab0 Add bounds checking to Eigen serializer 2022-01-03 17:00:24 +08:00
David Tellenbach
ba91839d71 Remove user survey from Doxygen header 2021-12-31 12:15:19 +01:00
Shiva Ghose
a4098ac676 Fix duplicate include guard *ALTIVEC_H -> *ZVECTOR_H
Some some header guards were repeated between the `AltiVec` package and the
`ZVector` packages. This could cause a problem if (for whatever reason) someone
attempts to include headers for both architectures.
2021-12-31 08:43:24 +00:00
David Tellenbach
22a347b9d2 Remove unused EIGEN_HAS_STATIC_ARRAY_TEMPLATE
ec2fd0f7 removed the EIGEN_HAS_STATIC_ARRAY_TEMPLATE but forgot to remove this
last occurrence.

This fixes issue #2399.
2021-12-30 15:26:55 +00:00
David Tellenbach
d705eb5f86 Revert "Select AVX2 even if the data size is not a multiple of 8"
Tests are failing for AVX and NEON.

This reverts commit eb85b97339.
2021-12-28 23:57:06 +01:00
Rasmus Munk Larsen
8eab7b6886 Improve exp<float>(): Don't flush denormal results +4% speedup.
1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to
degree 6. With exactly representable coefficients computed by the Sollya tool,
this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for
arguments where exp(x) is a normalized float. This change results in a speedup
of about 4% for AVX2.


2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to
~[-104;88] i.e. return denormalized values for large negative arguments instead
of zero. Compared to exp<double>(x) the denormalized results gradually decrease
in accuracy down to 0.033 relative error for arguments around x = -104 where
exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.
2021-12-28 15:00:19 +00:00
David Tellenbach
6e95c0cd9a Add missing internal namespace
The vectorization logic tests miss some namespace internal qualifiers.
2021-12-27 23:50:32 +00:00
David Tellenbach
d3675b2e73 Add vectorization_logic_1 test to list of CI smoketests 2021-12-28 00:32:14 +01:00
David Tellenbach
c06c3e52a0 Include immintrin.h if F16C is available and vectorization is disabled
If EIGEN_DONT_VECTORIZE is defined, immintrin.h is not included even if F16C is available. Trying to use F16C intrinsics thus fails.

This fixes issue #2395.
2021-12-25 19:51:42 +00:00
Erik Schultheis
f7a056bf04 Small fixes
This MR fixes a bunch of smaller issues, making the following changes:

* Template parameters in the documentation are documented with `\tparam` instead
  of `\param`
* Superfluous semicolon warnings fixed
* Fixed the type of literals used to initialize float variables
2021-12-21 16:46:09 +00:00
Kolja Brix
2a6594de29 Small cleanup of GDB pretty printer code 2021-12-18 17:34:38 +00:00
Erik Schultheis
dee6428a71 fixed clang warnings about alignment change and floating point precision 2021-12-18 17:18:16 +00:00
Kolja Brix
d0b4b75fbb Simplify logical_xor() 2021-12-16 20:20:47 +00:00
Rasmus Munk Larsen
e93a071774 Fix a bug introduced in !751. 2021-12-15 22:00:40 +00:00
Erik Schultheis
e939c06b0e Small speed-up in row-major sparse dense product 2021-12-15 18:46:25 +00:00
Erik Schultheis
2d39da8af5 space separated EIGEN_TEST_CUSTOM_CXX_FLAGS 2021-12-13 15:27:33 +00:00
Rohit Santhanam
6b2df80317 Fixes for enabling HIP unit tests. Includes a fix to make this work with the latest cmake. 2021-12-12 21:03:30 +00:00
Erik Schultheis
c20e908ebc turn some macros intro constexpr functions 2021-12-10 19:27:01 +00:00
Erik Schultheis
0f36e42169 Fix 2021-12-10 16:59:48 +00:00
Erik Schultheis
c35679af27 fixed customIndices2Array forgetting first index 2021-12-10 16:41:59 +00:00
Erik Schultheis
0b81185fe3 removed Find*.cmake scripts for which these are available in cmake itself 2021-12-10 02:02:34 +00:00
Erik Schultheis
495ffff945 removed helper cmake macro and don't use deprecated COMPILE_FLAGS anymore. 2021-12-09 23:09:56 +00:00
Rohit Santhanam
8a8122874b Build unit tests for HIP using C++14. 2021-12-09 08:04:19 +00:00
Rasmus Munk Larsen
f04fd8b168 Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385. 2021-12-08 17:57:23 +00:00
Erik Schultheis
39a6aff16c get rid of using namespace Eigen in sample code 2021-12-07 19:57:38 +00:00
Erik Schultheis
e4c40b092a disambiguate overloads for empty index list 2021-12-07 19:40:09 +00:00
Jens Wehner
c6fa0ca162 Idrsstabl 2021-12-06 20:00:00 +00:00
Erik Schultheis
cc11e240ac Some further cleanup 2021-12-06 18:01:15 +00:00
Erik Schultheis
14c32c60f3 fixed snippets 2021-12-05 17:31:12 +00:00
Erik Schultheis
cd83f34d3a fix typo StableNorm -> stableNorm 2021-12-04 14:52:09 +00:00
Rasmus Munk Larsen
3ffefcb95c Only include <atomic> if needed. 2021-12-02 23:55:25 +00:00
Jens Wehner
4ee2e9b340 Idrs refactoring 2021-12-02 23:32:07 +00:00
Jens Wehner
f63c6dd1f9 Bicgstabl 2021-12-02 22:48:22 +00:00
Erik Schultheis
2f65ec5302 fixed leftover else branch 2021-12-02 18:13:19 +00:00
Erik Schultheis
d60f7fa518 Improved lapacke binding code for HouseholderQR and PartialPivLU 2021-12-02 00:10:58 +00:00
Xinle Liu
7ef5f0641f Remove macro EIGEN_GPU_TEST_C99_MATH
Remove macro EIGEN_GPU_TEST_C99_MATH which is used in a single test file only and always defaults to true.
2021-12-01 14:48:56 +00:00
Antonio Sánchez
f56a5f15c6 Disable GCC-4.8 tests. 2021-12-01 02:12:52 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen
085c2fc5d5 Revert "Update SVD Module to allow specifying computation options with a... 2021-11-30 18:45:54 +00:00
Erik Schultheis
4dd126c630 fixed cholesky with 0 sized matrix (cf. #785) 2021-11-30 17:17:41 +00:00
Rohit Santhanam
4d3e50036f Fix for HIP compilation breakage in selfAdjoint and triangular view classes. 2021-11-30 14:00:59 +00:00
Erik Schultheis
63abb35dfd SFINAE'ing away non-const overloads if selfAdjoint/triangular view is not referring to an lvalue 2021-11-29 22:51:26 +00:00
Jakub Gałecki
1b8dce564a bugfix: issue #2375 2021-11-29 22:26:15 +00:00
Francesco Mazzoli
eb85b97339 Select AVX2 even if the data size is not a multiple of 8 2021-11-29 21:13:24 +00:00
Arthur
eef33946b7 Update SVD Module to allow specifying computation options with a template parameter. Resolves #2051 2021-11-29 20:50:46 +00:00
Erik Schultheis
4a76880351 Updated CMake
This patch updates the minimum required CMake version to 3.10 and removes the EIGEN_TEST_CXX11 CMake option, including corresponding logic.
2021-11-29 20:24:20 +00:00
Erik Schultheis
f33a31b823 removed EIGEN_HAS_CXX11_* and redundant EIGEN_COMP_CXXVER checks 2021-11-29 19:18:57 +00:00
Rohit Santhanam
9d3ffb3fbf Fix for HIP compilation failure in DenseBase. 2021-11-28 15:59:30 +00:00
David Tellenbach
08da52eb85 Remove DenseBase::nonZeros() which just calls DenseBase::size()
Fixes #2382.
2021-11-27 14:31:00 +00:00
Ali Can Demiralp
96e537d6fd Add EIGEN_DEVICE_FUNC to DenseBase::hasNaN() and DenseBase::allFinite(). 2021-11-27 11:27:52 +00:00
Erik Schultheis
b8b6566f0f Currently, the binding of LLT to Lapacke is done using a large macro. This factors out a large part of the functionality of the macro and implement them explicitly. 2021-11-25 16:11:25 +00:00
Erik Schultheis
ec4efbd696 remove EIGEN_HAS_CXX11 2021-11-24 20:08:49 +00:00
Rasmus Munk Larsen
cfdb3ce3f0 Fix warnings about shadowing definitions. 2021-11-23 14:34:47 -08:00
Rasmus Munk Larsen
5e89573e2a Implement Eigen::array<...>::reverse_iterator if std::reverse_iterator exists. 2021-11-20 00:22:46 +00:00
Rasmus Munk Larsen
5137a5157a Make numeric_limits members constexpr as per the newer C++ standards.
Author: majnemer@google.com
2021-11-19 15:58:36 +00:00
Erik Schultheis
7e586635ba don't use deprecated MappedSparseMatrix 2021-11-19 15:58:04 +00:00
Rasmus Munk Larsen
11cb7b8372 Add basic iterator support for Eigen::array to ease transition to std::array in third-party libraries. 2021-11-19 05:14:30 +00:00
Antonio Sanchez
c107bd6102 Fix errors for windows build. 2021-11-19 04:23:25 +00:00
Erik Schultheis
b0fb5417d3 Fixed Sparse-Sparse Product in case of mixed StorageIndex types 2021-11-18 18:33:31 +00:00
Rasmus Munk Larsen
96aeffb013 Make the new TensorIO implementation work with TensorMap with const elements. 2021-11-17 18:16:04 -08:00
Rasmus Munk Larsen
824d06eb36 Include <numeric> to get std::iota. 2021-11-18 00:47:18 +00:00
Pablo Speciale
d04edff570 Update Umeyama.h: src_var is only used when with_scaling == true. Therefore, the actual computation can be avoided when with_scaling == false. 2021-11-16 17:58:22 +00:00
Antonio Sanchez
ffb78e23a1 Fix tensor broadcast off-by-one error.
Caught by JAX unit tests.  Triggered if broadcast is smaller than packet
size.
2021-11-16 17:37:38 +00:00
cpp977
f73c95c032 Reimplemented the Tensor stream output. 2021-11-16 17:36:58 +00:00
Rasmus Munk Larsen
2b9297196c Update Transform.h to make transform_construct_from_matrix and transform_take_affine_part callable from device code. Fixes #2377. 2021-11-16 00:58:30 +00:00
Erik Schultheis
ca9c848679 use consistent StorageIndex types in SparseMatrix::Map
and `SparseMatrix::TransposedSparseMatrix`
2021-11-15 22:18:26 +00:00
Erik Schultheis
13954c4440 moved pruning code to SparseVector.h 2021-11-15 22:16:01 +00:00
Nathan Luehr
da79095923 Convert diag pragmas to nv_diag. 2021-11-15 03:42:42 +00:00
Erik Schultheis
532cc73f39 fix a typo 2021-11-13 13:11:06 +02:00
jenswehner
675b72e44b added clang format 2021-11-09 23:49:01 +01:00
Ben Barsdell
50df8d3d6d Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-11-05 16:39:37 +11:00
Rasmus Munk Larsen
55e3ae02ac Compare summation results against forward error bound. 2021-11-04 18:04:04 -07:00
Gengxin Xie
5c642950a5 Bug Fix: correct the bug that won't define EIGEN_HAS_FP16_C
if the compiler isn't clang
2021-11-04 22:13:01 +00:00
Gilad
0d73440fb2 Documentation of Quaternion constructor from MatrixBase 2021-11-04 16:21:26 +00:00
Minh Quan HO
4284c68fbb nestbyvalue test: fix uninitialized matrix
- Doing computation with uninitialized (zero-ed ? but thanks Linux) matrix, or
worse NaN on other non-linux systems.
- This commit fixes it by initializing to Random().
2021-11-04 14:32:12 +01:00
Xinle Liu
478a1bdda6 Fix total deflation issue in BDCSVD, when & only when M is already diagonal. 2021-11-02 16:53:55 +00:00
Antonio Sanchez
8f8c2ba2fe Remove bad "take" impl that causes g++-11 crash.
For some reason, having `take<n, numeric_list<T>>` for `n > 0` causes
g++-11 to ICE with
```
sorry, unimplemented: unexpected AST of kind nontype_argument_pack
```
It does work with other versions of gcc, and with clang.
I filed a GCC bug
[here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102999).

Technically we should never actually run into this case, since you
can't take n > 0 elements from an empty list.  Commenting it out
allows our Eigen tests to pass.
2021-11-01 17:04:41 +00:00
Antonio Sanchez
f6c8cc0e99 Fix TensorReduction warnings and error bound for sum accuracy test.
The sum accuracy test currently uses the default test precision for
the given scalar type.  However, scalars are generated via a normal
distribution, and given a large enough count and strong enough random
generator, the expected sum is zero.  This causes the test to
periodically fail.

Here we estimate an upper-bound for the error as `sqrt(N) * prec` for
summing N values, with each having an approximate epsilon of `prec`.

Also fixed a few warnings generated by MSVC when compiling the
reduction test.
2021-10-30 14:59:00 -07:00
Rasmus Munk Larsen
b3bea43a2d Don't use unrolled loops for stateful reducers. The problem is the combination step, e.g.
reducer0.reducePacket(accum1, accum0);
reducer0.reducePacket(accum2, accum0);
reducer0.reducePacket(accum3, accum0);

For the mean reducer this will increment the count as well as adding together the accumulators and result in the wrong count being divided into the sum at the end.
2021-10-28 23:52:54 +00:00
Chip Kerchner
9cf34ee0ae Invert rows and depth in non-vectorized portion of packing (PowerPC). 2021-10-28 21:59:41 +00:00
Ilya Tokar
e1cb6369b0 Add AVX vector path to float2half/half2float
Makes e. g. matrix multiplication 2x faster:
name         old cpu/op  new cpu/op  delta
BM_convers   181ms ± 1%    62ms ± 9%  -65.82%  (p=0.016 n=4+5)

Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
Antonio Sanchez
03d4cbb307 Fix min/max nan-propagation for scalar "other".
Copied input type from `EIGEN_MAKE_CWISE_BINARY_OP`.

Fixes #2362.
2021-10-28 09:28:29 -07:00
Antonio Sanchez
e559701981 Fix compile issue for gcc 4.8 2021-10-28 08:23:19 -07:00
Fabian Keßler
19cacd3ecb optimize cmake scripts for subproject use 2021-10-28 16:08:02 +02:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00
Antonio Sanchez
40bbe8a4d0 Fix ZVector build.
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu.  This allows the
packetmath tests to pass.
2021-10-27 16:30:15 +00:00
Alex Druinsky
6bb6a6bf53 Vectorize fp16 tanh and logistic functions on Neon
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
2021-10-27 16:09:16 +00:00
Antonio Sánchez
185ad0e610 Revert "Avoid integer overflow in EigenMetaKernel indexing"
This reverts commit 100d7caf92
2021-10-27 14:55:25 +00:00
Rasmus Munk Larsen
68e0d023c0 Remove license column in tables for builtin sparse solvers since all are MPL2 now. 2021-10-26 18:09:22 +00:00
Andreas Krebbel
8faafc3aaa ZVector: Move alignas qualifier to come first
We currently have plenty of type definitions with the alignment
qualifier coming after the type.  The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];

Turn this into:
EIGEN_ALIGN16 int ai[4];
2021-10-26 15:33:47 +02:00
Ben Barsdell
100d7caf92 Avoid integer overflow in EigenMetaKernel indexing
- The current implementation computes `size + total_threads`, which can
  overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to
  the maximum representable value.
- The num_blocks calculation can also overflow due to the implementation
  of divup().
- This patch prevents these overflows and allows the kernel to work
  correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
2021-10-26 00:04:28 +00:00
Alex Druinsky
d0e3791b1a Fix vectorized reductions for Eigen::half
Fixes compiler errors in expressions that look like

  Eigen::Matrix<Eigen::half, 3, 1>::Random().maxCoeff()

The error comes from the code that creates the initial value for
vectorized reductions. The fix is to specify the scalar type of the
reduction's initial value.

The cahnge is necessary for Eigen::half because unlike other types,
Eigen::half scalars cannot be implicitly created from integers.
2021-10-25 14:44:33 -07:00
Maxiwell S. Garcia
99600bd1a6 test: fix boostmutiprec test to compile with older Boost versions
Eigen boostmultiprec test redefines a symbol that is already defined
inside Boot Math [1]. Boost has fixed it recently [2], but this
patch avoids errors if Boost version was less than 1.77.

https://github.com/boostorg/math/blob/boost-1.76.0/include/boost/math/policies/policy.hpp#L18
6830712302 (diff-c7a8e5911c2e6be4138e1a966d762200f147792ac16ad96fdcc724313d11f839)
2021-10-25 20:32:33 +00:00
Yann Billeter
6c3206152a fix(CommaInitializer): pass dims at compile-time 2021-10-25 19:53:38 +00:00
Antonio Sanchez
a500da1dc0 Fix broadcasting oob error.
For vectorized 1-dimensional inputs that do not take the special
blocking path (e.g. `std::complex<...>`), there was an
index-out-of-bounds error causing the broadcast size to be
computed incorrectly.  Here we fix this, and make other minor
cleanup changes.

Fixes #2351.
2021-10-25 19:31:12 +00:00
Antonio Sanchez
0578feaabc Remove const from visitor return type.
This seems to interfere with `pload`/`ploadu`, since `pload<const
Packet**>` are not defined.

This should unbreak the arm/ppc builds.
2021-10-25 19:09:50 +00:00
benardp
b63c096fbb Extend EIGEN_QT_SUPPORT to Qt6 2021-10-23 23:43:06 +00:00
Lennart Steffen
163f11e24a Included note on inner stride for compile-time vectors. See https://gitlab.com/libeigen/eigen/-/issues/2355#note_711078126 2021-10-22 09:46:43 +00:00
Nico
b17bcddbca Fix -Wbitwise-instead-of-logical clang warning
& and | short-circuit, && and || don't. When both arguments to those
are boolean, the short-circuiting version is usually the desired one, so
clang warns on this.

Here, it is inconsequential, so switch to && and || to suppress the warning.
2021-10-21 23:32:45 -04:00
Rasmus Munk Larsen
2d3fec8ff6 Add nan-propagation options to matrix and array plugins. 2021-10-21 19:40:11 +00:00
Antonio Sanchez
b86e013321 Revert bit_cast to use memcpy for CUDA.
To elide the memcpy, we need to first load the `src` value into
registers by making a local copy. This avoids the need to resort
to potential UB by using `reinterpret_cast`.

This change doesn't seem to affect CPU (at least not with gcc/clang).
With optimizations on, the copy is also elided.
2021-10-21 08:14:11 -07:00
Antonio Sanchez
45e67a6fda Use reinterpret_cast on GPU for bit_cast.
This seems to be the recommended approach for doing type punning in
CUDA. See for example
- https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union
- https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/
(the latter puns a double to an `int2`).
The issue is that for CUDA, the `memcpy` is not elided, and ends up
being an expensive operation.  We already have similar `reintepret_cast`s across
the Eigen codebase for GPU (as does TensorFlow).
2021-10-20 21:34:40 +00:00
Antonio Sanchez
24ebb37f38 Disable Tree reduction for GPU.
For moderately sized inputs, running the Tree reduction quickly
fills/overflows the GPU thread stack space, leading to memory errors.
This was happening in the `cxx11_tensor_complex_gpu` test, for example.
Disabling tree reduction on GPU fixes this.
2021-10-20 20:42:37 +00:00
Rasmus Munk Larsen
360290fc42 Improve accuracy of full tensor reduction for half and bfloat16 by reducing leaf size in tree reduction.
Add more unit tests for summation accuracy.
2021-10-20 19:54:06 +00:00
Antonio Sanchez
95bb645e92 Fix MSVC+NVCC EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR compilation.
Looks like we need to update the
`EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR` for newer versions of MSVC as
well when compiling with NVCC.  Fixes build issues for VS 2017.
2021-10-20 19:38:14 +00:00
Antonio Sanchez
fd5f48e465 Fix tuple compilation for VS2017.
VS2017 doesn't like deducing alias types, leading to a bunch of compile
errors for functions involving the `tuple` alias.  Replacing with
`TupleImpl` seems to solve this, allowing the test to compile/pass.
2021-10-20 19:18:34 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Rasmus Munk Larsen
f2c9c2d2f7 Vectorize Visitor.h. 2021-10-20 16:58:01 +00:00
Antonio Sanchez
2bf07fa5b5 Fix Windows CMake compiler/OS detection.
Replaced deprecated `DetermineVSServicePack`macro with recommended
`CMAKE_CXX_COMPILER_VERSION`.

Deleted custom `OSVersion` detection.  The windows-specific code is
highly outdated, and on other systems simply returns `CMAKE_SYSTEM`.
We will get values like `windows-10.0.17763`, but this is preferable
to `unknownwin`, and saves us needing to maintain a separate cmake file.
2021-10-02 16:30:38 +00:00
Rasmus Munk Larsen
1d75fab368 Speed up tensor reduction 2021-10-02 14:58:23 +00:00
Antonio Sanchez
be9e7d205f Reduce tensor_contract_gpu test.
The original test times out after 60 minutes on Windows, even when
setting flags to optimize for speed.  Reducing the number of
contractions performed from 3600->27 for subtests 8,9 allow the
two to run in just over a minute each.
2021-10-02 04:36:15 +00:00
Antonio Sanchez
701f5d1c91 Fix gpu special function tests.
Some checks used incorrect values, partly from copy-paste errors,
partly from the change in behaviour introduced in !398.

Modified results to match scipy, simplified tests by updating
`VERIFY_IS_CWISE_APPROX` to work for scalars.
2021-10-01 10:20:50 -07:00
Antonio Sanchez
f0f1d7938b Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.
2021-09-27 15:15:11 -07:00
Kolja Brix
51a0b4e2d2 Reorganize test main file 2021-09-27 18:30:47 +00:00
Antonio Sanchez
21640612be Disable more CUDA warnings.
For cuda 9.2 and 11.4, they changed the numbers again.

Fixes #2331.
2021-09-24 21:31:14 -07:00
Antonio Sanchez
de218b471d Add -arch=<arch> argument for nvcc.
Without this flag, when compiling with nvcc, if the compute architecture of a card does
not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`,
then the kernel will fail to run with:
```
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
This can happen, for example, when compiling with an older cuda version
that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda
9.2 only supports up to `sm_70`).

With the `-arch=<arch>` flag, the code will compile and run at the
supplied architecture.
2021-09-24 20:48:01 -07:00
Antonio Sanchez
846d34384a Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.
2021-09-24 20:15:55 -07:00
Antonio Sanchez
7b00e8b186 Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).
2021-09-24 14:43:59 -07:00
Antonio Sanchez
e9e90892fe Disable another device warning 2021-09-23 13:43:18 -07:00
Antonio Sanchez
86c0decc48 Disable more NVCC warnings.
The 2979 warning is yet another "calling a __host__ function from a
__host__ device__ function.  Although we probably should eventually
address these, they are flooding the logs.  Most of these are
harmless since we only call the original from the host.
In cases where these are actually called from device, an error is generated
instead anyways.

The 2977 warning is a bit strange - although the warning suggests the
`__device__` annotation is ignored, this doesn't actually seem to be
the case.  Without the `__device__` declarations, the kernel actually
fails to run when attempting to construct such objects.  Again,
these warnings are flooding the logs, so disabling for now.
2021-09-23 10:52:39 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
Antonio Sanchez
76bb29c0c2 Add -mfma for AVX512DQ tests. 2021-09-22 14:06:29 -07:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Antonio Sanchez
3753e6a2b3 Add AVX512 test job to CI. 2021-09-21 15:11:31 -07:00
Antonio Sanchez
343847273d Enable AVX512 testing. 2021-09-21 15:00:36 -07:00
Alexander Grund
b5eaa42695 Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
2021-09-20 10:37:50 +02:00
Alexander Karatarakis
4d622be118 [AutodiffScalar] Remove const when returning by value
clang-tidy: Return type 'const T' is 'const'-qualified at the top level,
which may reduce code readability without improving const correctness

The types are somewhat long, but the affected return types are of the form:
```
const T my_func() { /**/ }
```

Change to:
```
T my_func() { /**/ }
```
2021-09-18 21:23:32 +00:00
Antonio Sanchez
f49217e52b Fix implicit conversion warnings in tuple_test.
Fixes #2329.
2021-09-17 19:40:22 -07:00
Rasmus Munk Larsen
5595cfd194 Run CI tests in parallel no available cores. 2021-09-17 22:35:22 +00:00
Antonio Sanchez
3c724c44cf Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.
2021-09-17 21:09:34 +00:00
Antonio Sanchez
9882aec279 Silence string overflow warning for GCC in initializer_list_construction test.
This looks to be a GCC bug.  It doesn't seem to reproduce is a smaller example,
making it hard to isolate.
2021-09-17 18:33:50 +00:00
Rasmus Munk Larsen
5dac69ff0b Added a macro to pass arguments to ctest, e.g. to run tests in parallel. 2021-09-17 18:33:12 +00:00
Antonio Sanchez
5dac0b53c9 Move Eigen::all,last,lastp1,lastN to Eigen::placeholders::.
These names are so common, IMO they should not exist directly in the
`Eigen::` namespace.  This prevents us from using the `last` or `all`
names for any parameters or local variables, otherwise spitting out
warnings about shadowing or hiding the global values.  Many external
projects (and our own examples) also heavily use
```
using namespace Eigen;
```
which means these conflict with external libraries as well, e.g.
`std::fill(first,last,value)`.

It seems originally these were placed in a separate namespace
`Eigen::placeholders`, which has since been deprecated.  I propose
to un-deprecate this, and restore the original locations.

These symbols are also imported into `Eigen::indexing`, which
additionally imports `fix` and `seq`. An alternative is to remove the
`placeholders` namespace and stick with `indexing`.

NOTE: this is an API-breaking change.

Fixes #2321.
2021-09-17 10:21:42 -07:00
Rohit Santhanam
44da7a3b9d Disable specific subtests that fail on HIP due to non-functional device side malloc/free (on HIP). 2021-09-17 16:19:03 +00:00
Antonio Sanchez
16f9a20a6f Add buildtests_gpu and check_gpu to simplify GPU testing.
This is in preparation of adding GPU tests to the CI, allowing
us to limit building/testing of GPU-specific tests for a given
GPU-capable runner.

GPU tests are tagged with the label "gpu".  The new targets
```
make buildtests_gpu
make check_gpu
```
allow building and running only the gpu tests.
2021-09-17 00:48:57 +00:00
Rasmus Munk Larsen
1239adfcab Remove -fabi-version=6 flag from AVX512 builds. It was added to fix builds with gcc 4.9, but these don't even work today, and the flag breaks compilation with newer versions of gcc. 2021-09-16 16:16:47 -07:00
Rasmus Munk Larsen
6cadab6896 Clean up EIGEN_STATIC_ASSERT to only use standard c++11 static_assert. 2021-09-16 20:43:54 +00:00
Rasmus Munk Larsen
7b975acb1f Remove unused variable. 2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen
92849d814b Remove unused variable. 2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen
da027fa20a Remove unused variable. 2021-09-16 20:02:42 +00:00
Ryan Pavlik
3c87d6b662 Fix typos in copyright dates
(cherry picked from commit 3335e0767c)
2021-09-15 20:49:43 +00:00
Antonio Sanchez
cb50730993 Default eigen_packet_wrapper constructor.
This makes it trivial, allowing use of `memcpy`.

Fixes #2326
2021-09-14 10:57:22 -07:00
Rohit Santhanam
a751225845 Minor fix for compilation error on HIP. 2021-09-12 14:06:58 +00:00
Antonio Sanchez
2e31570c16 Fix tuple_test after gpu_test_helper update.
Duplicating the namespace `tuple_impl` caused a conflict with the
`arch/GPU/Tuple.h` definitions for the `tuple_test`.  We can't
just use `Eigen::internal` either, since there exists a different
`Eigen::internal::get`.  Renaming the namespace to `test_detail`
fixes the issue.
2021-09-11 20:24:42 -07:00
Antonio Sanchez
d06c639667 Fix unused variable warning and unnecessessary gpuFree. 2021-09-11 20:02:22 -07:00
Antonio Sanchez
bf66137efc New GPU test utilities.
This introduces new functions:
```
// returns kernel(args...) running on the CPU.
Eigen::run_on_cpu(Kernel kernel, Args&&... args);

// returns kernel(args...) running on the GPU.
Eigen::run_on_gpu(Kernel kernel, Args&&... args);
Eigen::run_on_gpu_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);

// returns kernel(args...) running on the GPU if using
//   a GPU compiler, or CPU otherwise.
Eigen::run(Kernel kernel, Args&&... args);
Eigen::run_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
```

Running on the GPU is accomplished by:
- Serializing the kernel inputs on the CPU
- Transferring the inputs to the GPU
- Passing the kernel and serialized inputs to a GPU kernel
- Deserializing the inputs on the GPU
- Running `kernel(inputs...)` on the GPU
- Serializing all output parameters and the return value
- Transferring the serialized outputs back to the CPU
- Deserializing the outputs and return value on the CPU
- Returning the deserialized return value

All inputs must be serializable (currently POD types, `Eigen::Matrix`
and `Eigen::Array`).  The kernel must also  be POD (though usually
contains no actual data).

Tested on CUDA 9.1, 10.2, 11.3, with g++-6, g++-8, g++-10 respectively.

This MR depends on !622, !623, !624.
2021-09-10 14:22:50 -07:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Maxiwell S. Garcia
36402e281d ci: ppc64le: disable MMA for gcc-10
This patch disables MMA for CI because the building environment is
using Ubuntu 18.04 image with LD 2.30. This linker version together
with gcc-10 causes some 'unrecognized opcode' errors.
2021-09-09 12:18:07 -05:00
Antonio Sanchez
6c10495a78 Remove unnecessary std::tuple reference. 2021-09-09 15:49:44 +00:00
Antonio Sanchez
26e5beb8cb Device-compatible Tuple implementation.
An analogue of `std::tuple` that works on device.

Context: I've tried `std::tuple` in various versions of NVCC and clang,
and although code seems to compile, it often fails to run - generating
"illegal memory access" errors, or "illegal instruction" errors.
This replacement does work on device.
2021-09-08 13:34:19 -07:00
Antonio Sanchez
fcd73b4884 Add a simple serialization mechanism.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer.  Also added convenience routines for serializing
a list of arguments.

This will mainly be for testing, specifically to transfer data to
and from the GPU.
2021-09-08 09:38:59 -07:00
Huang, Zhaoquan
558b3d4fb8 Add LLDB Pretty Printer 2021-09-07 17:28:24 +00:00
Antonio Sanchez
7792b1e909 Fix AVX2 PacketMath.h.
There were a couple typos ps -> epi32, and an unaligned load issue.
2021-09-03 19:47:57 +00:00
Antonio Sanchez
5bf35383e0 Disable MSVC constant condition warning.
We use extensive use of `if (CONSTANT)`, and cannot use c++17's `if
constexpr`.
2021-09-03 11:07:18 -07:00
Antonio Sanchez
def145547f Add missing packet types in pset1 call.
Oops, introduced this when "fixing" integer packets.
2021-09-02 16:21:07 -07:00
Antonio Sanchez
eea2a3385c Remove more DynamicSparseMatrix references.
Also fixed some typos in SparseExtra/MarketIO.h.
2021-09-02 15:36:47 -07:00
Antonio Sanchez
3b48a3b964 Remove stray DynamicSparseMatrix references.
DynamicSparseMatrix has been removed.  These shouldn't be here anymore.
2021-09-02 19:47:26 +00:00
Antonio Sanchez
ebd4b17d2f Fix tridiagonalization_inplace_selector.
The `Options` of the new `hCoeffs` vector do not necessarily match
those of the `MatrixType`, leading to build errors. Having the
`CoeffVectorType` be a template parameter relieves this restriction.
2021-09-02 12:23:27 -07:00
Sergiu Deitsch
bf426faf93 cmake: populate package registry by default 2021-09-02 17:36:01 +00:00
Jens Wehner
8286073c73 Matrixmarket extension 2021-09-02 17:23:33 +00:00
Sergiu Deitsch
e8beb4b990 cmake: use ARCH_INDEPENDENT versioning if available 2021-09-02 16:08:58 +00:00
Sergiu Deitsch
7bc90cee7d cmake: remove unused interface definitions 2021-09-02 15:41:56 +02:00
Antonio Sanchez
998bab4b04 Missing EIGEN_DEVICE_FUNCs to get gpu_basic passing with CUDA 9.
CUDA 9 seems to require labelling defaulted constructors as
`EIGEN_DEVICE_FUNC`, despite giving warnings that such labels are
ignored.  Without these labels, the `gpu_basic` test fails to
compile, with errors about calling `__host__` functions from
`__host__ __device__` functions.
2021-09-01 19:49:53 -07:00
Antonio Sanchez
74da2e6821 Rename Tuple -> Pair.
This is to make way for a new `Tuple` class that mimics `std::tuple`,
but can be reliably used on device and with aligned Eigen types.

The existing Tuple has very few references, and is actually an
analogue of `std::pair`.
2021-09-02 02:20:54 +00:00
Antonio Sanchez
3d4ba855e0 Fix AVX integer packet issues.
Most are instances of AVX2 functions not protected by
`EIGEN_VECTORIZE_AVX2`.  There was also a missing semi-colon
for AVX512.
2021-09-01 14:14:43 -07:00
Sergiu Deitsch
f2984cd077 cmake: remove deprecated package config variables 2021-09-01 19:05:51 +02:00
Maxiwell S. Garcia
09fc0f97b5 Rename 'vec_all_nan' of cxx11_tensor_expr test because this symbol is used by altivec.h 2021-09-01 16:42:51 +00:00
Antonio Sanchez
3a6296d4f1 Fix EIGEN_OPTIMIZATION_BARRIER for arm-clang.
Clang doesn't like !621, needs the "g" constraint back.
The "g" constraint also works for GCC >= 5.

This fixes our gitlab CI.
2021-09-01 09:19:55 -07:00
jenswehner
a443a2373f updated documentation 2021-08-31 22:58:28 +00:00
Antonio Sanchez
ff07a8a639 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.
2021-08-31 20:20:47 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00
Adam Kallai
1415817d8d win: include intrin header in Windows on ARM
intrin header is needed for _BitScanReverse and
_BitScanReverse64
2021-08-31 10:57:34 +02:00
Rasmus Munk Larsen
6f429a202d Add missing dependency on LAPACK test suite binaries to target buildtests, so make check will work correctly when EIGEN_ENABLE_LAPACK_TESTS is ON. 2021-08-30 21:49:08 +00:00
Rasmus Munk Larsen
7e096ddcb0 Allow old Fortran code for LAPACK tests to compile despite argument mismatch errors (REAL passed to COMPLEX workspace argument) with GNU Fortran 10. 2021-08-30 19:47:30 +00:00
Turing Eret
3324389f6d Add EIGEN_TENSOR_PLUGIN support per issue #2052. 2021-08-30 19:36:55 +00:00
Antonio Sanchez
5db9e5c779 Fix fix<N> when variable templates are not supported.
There were some typos that checked `EIGEN_HAS_CXX14` that should have
checked `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`, causing a mismatch
in some of the `Eigen::fix<N>` assumptions.

Also fixed the `symbolic_index` test when
`EIGEN_HAS_CXX14_VARIABLE_TEMPLATES` is 0.

Fixes #2308
2021-08-30 08:06:55 -07:00
Jens Wehner
53ad9c75b4 included unordered_map header 2021-08-27 16:53:28 +00:00
Kolja Brix
a032397ae4 Fix PEP8 and formatting issues in GDB pretty printer. 2021-08-26 15:22:28 +00:00
jenswehner
9abf4d0bec made RandomSetter C++11 compatible 2021-08-25 20:24:55 +00:00
Antonio Sanchez
eeacbd26c8 Bump CMake files to at least c++11.
Removed all configurations that explicitly test or set the c++ standard
flags. The only place the standard is now configured is at the top of
the main `CMakeLists.txt` file, which can easily be updated (e.g. if
we decide to move to c++14+). This can also be set via command-line using
```
> cmake -DCMAKE_CXX_STANDARD 14
```

Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to
build/run the `cxx11_*` tests.  We will likely end up renaming these
tests and removing the `CXX11` subfolder.
2021-08-25 20:07:48 +00:00
Jakub Lichman
dc5b1f7d75 AVX512 and AVX2 support for Packet16i and Packet8i added 2021-08-25 19:38:23 +00:00
Han-Kuan Chen
ab28419298 optimize predux if architecture is aarch64 2021-08-25 19:18:54 +00:00
Antonio Sanchez
4011e4d258 Remove c++11-off CI jobs.
This is step 1 in preparing to transition beyond c++03.
2021-08-24 17:42:39 +00:00
jenswehner
90b3b6b572 added doxygen flowchart 2021-08-24 17:11:51 +00:00
jenswehner
d85de1ef56 removed sparse dynamic matrix 2021-08-24 10:33:00 +02:00
Kolja Brix
e5c8f283ae Add support for Eigen::Block types to GDB pretty printer.
Submitted by Allan Leal, see #1539 (https://gitlab.com/libeigen/eigen/issues/1539).
2021-08-23 18:10:17 +02:00
Kolja Brix
58e086b8c8 Add random matrix generation via SVD 2021-08-23 16:00:05 +00:00
Rasmus Munk Larsen
82dd3710da Update version of master branch to 3.4.90. 2021-08-18 13:46:05 -07:00
Antonio Sanchez
2b410ecbef Workaround VS 2017 arg bug.
In VS 2017, `std::arg` for real inputs always returns 0, even for
negative inputs.  It should return `PI` for negative real values.
This seems to be fixed in VS 2019 (MSVC 1920).
2021-08-18 18:39:18 +00:00
Antonio Sanchez
0c4ae56e37 Remove unaligned assert tests.
Manually constructing an unaligned object declared as aligned
invokes UB, so we cannot technically check for alignment from
within the constructor.  Newer versions of clang optimize away
this check.

Removing the affected tests.
2021-08-18 18:05:24 +00:00
Jakob Struye
53a29c7e35 Clearer doc for squaredNorm 2021-08-18 15:11:15 +00:00
Antonio Sanchez
fc9d352432 Renamed shift_left/shift_right to shiftLeft/shiftRight.
For naming consistency.  Also moved to ArrayCwiseUnaryOps, and added
test.
2021-08-17 20:04:48 -07:00
Antonio Sanchez
2cc6ee0d2e Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.
2021-08-17 07:42:04 -07:00
Chip-Kerchner
8dcf3e38ba Fix unaligned loads in ploadLhs & ploadRhs for P8. 2021-08-16 20:28:22 -05:00
Rasmus Munk Larsen
7e6f94961c Update documentation for matrix decompositions and least squares solvers. 2021-08-16 21:56:18 +00:00
andiwand
5c6b3efead minor doc fix in Map.h 2021-08-16 12:02:33 +00:00
Chip-Kerchner
e07227c411 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8). 2021-08-13 11:21:28 -05:00
Chip Kerchner
66499f0f17 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+ 2021-08-12 21:38:54 +00:00
Rasmus Munk Larsen
96e3b4fc95 Add CompleteOrthogonalDecomposition to the table of linear algeba decompositions. 2021-08-12 16:11:15 +00:00
Antonio Sanchez
fb1718ad14 Update code snippet for tridiagonalize_inplace. 2021-08-12 08:30:12 -07:00
Rasmus Munk Larsen
8ce341caf2 * revise the meta_least_common_multiple function template, add a bool variable to check whether the A is larger than B.
* This can make less compile_time if A is smaller than B. and avoid failure in compile if we get a little A and a great B.

Authored by @awoniu.
2021-08-11 18:10:01 +00:00
Nikolay Tverdokhleb
f1b899eef7 Do not set AnnoyingScalar::dont_throw if not defined EIGEN_TEST_ANNOYING_SCALAR_DONT_THROW.
- Because that member is not declared if the macro is defined.
2021-08-11 10:01:21 +00:00
ChipKerchner
413bc491f1 Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). 2021-08-10 15:03:18 -05:00
jenswehner
e3e74001f7 added includes for unordered_map 2021-08-10 13:34:57 +02:00
Gauri Deshpande
e6a5a594a7 remove denormal flushing in fp32tobf16 for avx & avx512 2021-08-09 22:15:21 +00:00
Daniel N. Miller (APD)
09d7122468 Do not build shared libs if not supported 2021-08-06 20:48:32 +00:00
Rasmus Munk Larsen
a5a7faeb45 Avoid memory allocation in tridiagonalization_inplace_selector::run. 2021-08-06 20:48:10 +00:00
Maxiwell S. Garcia
ae2abe1f58 ci: ppc64le: disable MMA and remove 'allow_failure' for gcc-10 CXX11=off
This patch disables MMA for CI because the building environment is using
Ubuntu 18.04 image with LD 2.30. This linker version together with gcc-10
causes some 'unrecognized opcode' errors.
2021-08-05 21:43:05 +00:00
Jens Wehner
4d870c49b7 updated documentation for middleCol and middleRow 2021-08-05 17:21:16 +00:00
Alexander Karatarakis
4ba872bd75 Avoid leading underscore followed by cap in template identifiers 2021-08-04 22:41:52 +00:00
Antonio Sanchez
5ad8b9bfe2 Make inverse 3x3 faster and avoid gcc bug.
There seems to be a gcc 4.7 bug that incorrectly flags the current
3x3 inverse as using uninitialized memory.  I'm *pretty* sure it's
a false positive, but it's hard to trigger.  The same warning
does not trigger with clang or later compiler versions.

In trying to find a work-around, this implementation turns out to be
faster anyways for static-sized matrices.

```
name                                            old cpu/op  new cpu/op  delta
BM_Inverse3x3<DynamicMatrix3T<float>>            423ns ± 2%   433ns ± 3%   +2.32%    (p=0.000 n=98+96)
BM_Inverse3x3<DynamicMatrix3T<double>>           425ns ± 2%   427ns ± 3%   +0.48%    (p=0.003 n=99+96)
BM_Inverse3x3<StaticMatrix3T<float>>            7.10ns ± 2%  0.80ns ± 1%  -88.67%  (p=0.000 n=114+112)
BM_Inverse3x3<StaticMatrix3T<double>>           7.45ns ± 2%  1.34ns ± 1%  -82.01%  (p=0.000 n=105+111)
BM_AliasedInverse3x3<DynamicMatrix3T<float>>     409ns ± 3%   419ns ± 3%   +2.40%   (p=0.000 n=100+98)
BM_AliasedInverse3x3<DynamicMatrix3T<double>>    414ns ± 3%   413ns ± 2%     ~       (p=0.322 n=98+98)
BM_AliasedInverse3x3<StaticMatrix3T<float>>     7.57ns ± 1%  0.80ns ± 1%  -89.37%  (p=0.000 n=111+114)
BM_AliasedInverse3x3<StaticMatrix3T<double>>    9.09ns ± 1%  2.58ns ±41%  -71.60%  (p=0.000 n=113+116)
```
2021-08-04 21:18:44 +00:00
Antonio Sanchez
31f796ebef Fix MPReal detection and support.
The latest version of `mpreal` has a bug that breaks `min`/`max`.
It also breaks with the latest dev version of `mpfr`. Here we
add `FindMPREAL.cmake` which searches for the library and tests if
compilation works.

Removed our internal copy of `mpreal.h` under `unsupported/test`, as
it is out-of-sync with the latest, and similarly breaks with
the latest `mpfr`.  It would be best to use the installed version
of `mpreal` anyways, since that's what we actually want to test.

Fixes #2282.
2021-08-03 17:55:03 +00:00
Antonio Sanchez
1cdec38653 Fix cmake warnings, FindPASTIX/FindPTSCOTCH.
We were getting a lot of warnings due to nested `find_package` calls
within `Find***.cmake` files.  The recommended approach is to use
[`find_dependency`](https://cmake.org/cmake/help/latest/module/CMakeFindDependencyMacro.html)
in package configuration files. I made this change for all instances.

Case mismatches between `Find<Package>.cmake` and calling
`find_package(<PACKAGE>`) also lead to warnings. Fixed for
`FindPASTIX.cmake` and `FindSCOTCH.cmake`.

`FindBLASEXT.cmake` was broken due to calling `find_package_handle_standard_args(BLAS ...)`.
The package name must match, otherwise the `find_package(BLASEXT)` falsely thinks
the package wasn't found.  I changed to `BLASEXT`, but then also copied that value
to `BLAS_FOUND` for compatibility.

`FindPastix.cmake` had a typo that incorrectly added `PTSCOTCH` when looking for
the `SCOTCH` component.

`FindPTSCOTCH` incorrectly added `***-NOTFOUND` to include/library lists,
corrupting them.  This led to cmake errors down-the-line.

Fixes #2288.
2021-08-03 17:26:28 +00:00
Antonio Sanchez
8cf6cb27ba Fix TriSycl CMake files.
This is to enable compiling with the latest trisycl. `FindTriSYCL.cmake` was
broken by commit 00f32752, which modified `add_sycl_to_target` for ComputeCPP.
This makes the corresponding modifications for trisycl to make them consistent.

Also, trisycl now requires c++17.
2021-08-03 16:44:29 +00:00
Antonio Sanchez
3d98a6ef5c Modify scalar pzero, ptrue, pselect, and p<binary> operations to avoid memset.
The `memset` function and bitwise manipulation only apply to POD types
that do not require initialization, otherwise resulting in UB. We currently
violate this in `ptrue` and `pzero`, we assume bitmasks for `pselect`, and
bitwise operations are applied byte-by-byte in the generic implementations.

This is causing issues for scalar types that do require initialization
or that contain non-POD info such as pointers (#2201). We either break
them, or force specializations of these functions for custom scalars,
even if they are not vectorized.

Here we modify these functions for scalars only - instead using only
scalar operations:
- `pzero`: `Scalar(0)` for all scalars.
- `ptrue`: `Scalar(1)` for non-trivial scalars, bitset to one bits for trivial scalars.
- `pselect`: ternary select comparing mask to `Scalar(0)` for all scalars
- `pand`, `por`, `pxor`, `pnot`: use operators `&`, `|`, `^`, `~` for all integer or non-trivial scalars, otherwise apply bytewise.

For non-scalar types, the original implementations are used to maintain
compatibility and minimize the number of changes.

Fixes #2201.
2021-08-03 08:44:28 -07:00
Antonio Sanchez
7880f10526 Enable equality comparisons on GPU.
Since `std::equal_to::operator()` is not a device function, it
fails on GPU.  On my device, I seem to get a silent crash in the
kernel (no reported error, but the kernel does not complete).

Replacing this with a portable version enables comparisons on device.

Addresses #2292 - would need to be cherry-picked.  The 3.3 branch
also requires adding `EIGEN_DEVICE_FUNC` in `BooleanRedux.h` to get
fully working.
2021-08-03 01:53:31 +00:00
pvcStillinGradSchool
c86ac71b4f Put code in monospace (typewriter) style. 2021-08-03 01:48:32 +00:00
hyunggi-sv
02a0e79c70 fix:typo in dox (has->have) 2021-08-03 00:45:00 +00:00
Antonio Sanchez
9816fe59b4 Fix assignment operator issue for latest MSVC+NVCC.
Details are scattered across #920, #1000, #1324, #2291.

Summary: some MSVC versions have a bug that requires omitting explicit
`operator=` definitions (leads to duplicate definition errors), and
some MSVC versions require adding explicit `operator=` definitions
(otherwise implicitly deleted errors).  This mess tries to cover
all the cases encountered.

Fixes #2291.
2021-08-03 00:26:10 +00:00
Alexander Karatarakis
f357283d31 _DerType -> DerivativeType as underscore-followed-by-caps is a reserved identifier 2021-07-29 18:02:04 +00:00
Jonas Harsch
5b81764c0f Fixed typo in TutorialSparse.dox 2021-07-26 07:20:19 +00:00
Antonio Sanchez
de2e62c62d Disable vectorization of comparisons except for bool.
Packet input/output types must currently be the same, and since these
have a return type of `bool`, vectorization will only work if
input is bool.
2021-07-25 13:39:50 -07:00
derekjchow
66ca41bd47 Add support for vectorizing logical comparisons. 2021-07-23 20:07:48 +00:00
arthurfeeney
a77638387d Fixes #1387 for compilation error in JacobiSVD with HouseholderQRPreconditioner that occurs when input is a compile-time row vector. 2021-07-20 20:11:22 +00:00
Antonio Sanchez
297f0f563d Fix explicit default cache size typo. 2021-07-20 11:40:17 -07:00
Antonio Sanchez
1fd5ce1002 For GpuDevice::fill, use a single memset if all bytes are equal.
The original `fill` implementation introduced a 5x regression on my
nvidia Quadro K1200.  @rohitsan reported up to 100x regression for
HIP.  This restores performance.
2021-07-10 13:37:16 +00:00
Antonio Sanchez
9c22795d65 Put attach/detach buffer back in for TensorDeviceSycl.
Also added a test to verify the original buffer is updated correctly.
2021-07-09 10:00:05 -07:00
Rohit Santhanam
beea14a18f Enable extract et. al. for HIP GPU. 2021-07-09 14:58:07 +00:00
Rasmus Munk Larsen
0c361c4899 Defer to std::fill_n when filling a dense object with a constant value. 2021-07-09 03:59:35 +00:00
Antonio Sanchez
1e6c6c1576 Replace memset with fill to work for non-trivial scalars.
For custom scalars, zero is not necessarily represented by
a zeroed-out memory block (e.g. gnu MPFR). We therefore
cannot rely on `memset` if we want to fill a matrix or tensor
with zeroes. Instead, we should rely on `fill`, which for trivial
types does end up getting converted to a `memset` under-the-hood
(at least with gcc/clang).

Requires adding a `fill(begin, end, v)` to `TensorDevice`.

Replaced all potentially bad instances of memset with fill.

Fixes #2245.
2021-07-08 18:34:41 +00:00
Jonas Harsch
e9c9a3130b Removed superfluous boolean degenerate in TensorMorphing.h. 2021-07-08 18:02:58 +00:00
Guoqiang QI
4bcd42c271 Make a copy of input matrix when try to do the inverse in place, this fixes #2285. 2021-07-08 17:05:26 +00:00
Kolja Brix
a59cf78c8d Add Doxygen-style documentation to main.h. 2021-07-07 18:23:59 +00:00
Antonio Sanchez
f44f05532d Fix CMake directory issues.
Allows absolute and relative paths for
- `INCLUDE_INSTALL_DIR`
- `CMAKEPACKAGE_INSTALL_DIR`
- `PKGCONFIG_INSTALL_DIR`

Type should be `PATH` not `STRING`.  Contrary to !211, these don't
seem to be made absolute if user-defined - according to the doc any
directories should use `PATH` type, which allows a file dialog
to be used via the GUI.  It also better handles file separators.

If user provides an absolute path, it will be made relative to
`CMAKE_INSTALL_PREFIX` so that the `configure_packet_config_file` will
work.

Fixes #2155 and #2269.
2021-07-07 17:24:57 +00:00
Antonio Sanchez
f5a9873bbb Fix Tensor documentation page.
The extra [TOC] tag is generating a huge floating duplicated
table-of-contents, which obscures the majority of the page
(see bottom of https://eigen.tuxfamily.org/dox/unsupported/eigen_tensors.html).
Remove it.

Also, headers do not support markup (see
[doxygen bug](https://github.com/doxygen/doxygen/issues/7467)), so
backticks like
```
```
end up generating titles that looks like
```
Constructor <tt>Tensor<double,2></tt>
```
Removing backticks for now.  To generate proper formatted headers, we
must directly use html instead of markdown, i.e.
```
<h2>Constructor <code>Tensor&lt;double,2&gt;</code></h2>
```
which is ugly.

Fixes #2254.
2021-07-03 04:39:22 +00:00
Rasmus Munk Larsen
7b35638ddb Fix breakage of conj_helper in conjunction with custom types introduced in !537. 2021-07-02 20:42:15 +00:00
Jonas Harsch
aab747021b Don't crash when attempting to shuffle an empty tensor. 2021-07-02 20:33:52 +00:00
Rasmus Munk Larsen
bbfc4d54cd Use padd instead of +. 2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen
9312a5bf5c Implement a generic vectorized version of Smith's algorithms for complex division. 2021-07-01 23:31:12 +00:00
Antonio Sanchez
6035da5283 Fix compile issues for gcc 4.8.
- Move constructors can only be defaulted as NOEXCEPT if all members
have NOEXCEPT move constructors.
- gcc 4.8 has some funny parsing bug in `a < b->c`, thinking `b-` is a template parameter.
2021-07-01 22:58:14 +00:00
Antonio Sanchez
154f00e9ea Fix inverse nullptr/asan errors for LU.
For empty or single-column matrices, the current `PartialPivLU`
currently dereferences a `nullptr` or accesses memory out-of-bounds.
Here we adjust the checks to avoid this.
2021-07-01 13:41:04 -07:00
Dan Miller
eb04775903 Fix duplicate definitions on Mac 2021-07-01 14:54:12 +00:00
Chip Kerchner
91e99ec1e0 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow 2021-06-30 23:05:04 +00:00
Alexander Karatarakis
60400334a9 Make DenseStorage<> trivially_copyable 2021-06-30 04:27:51 +00:00
大河メタル
c81da59a25 Correct declarations for aarch64-pc-windows-msvc 2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen
5aebbe9098 Get rid of redundant pabs instruction in complex square root. 2021-06-29 23:26:15 +00:00
Antonio Sanchez
3a087ccb99 Modify tensor argmin/argmax to always return first occurence.
As written, depending on multithreading/gpu, the returned index from
`argmin`/`argmax` is not currently stable.  Here we modify the functors
to always keep the first occurence (i.e. if the value is equal to the
current min/max, then keep the one with the smallest index).

This is otherwise causing unpredictable results in some TF tests.
2021-06-29 10:36:20 -07:00
Rohit Santhanam
2d132d1736 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.
2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen
bffd267d17 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. 2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen
52a5f98212 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. 2021-06-24 15:47:48 -07:00
Rasmus Munk Larsen
4ad30a73fc Use internal::ref_selector to avoid holding a reference to a RHS expression. 2021-06-22 14:31:32 +00:00
Rasmus Munk Larsen
ea62c937ed Update ComplexEigenSolver_eigenvectors.cpp 2021-06-21 19:06:25 +00:00
Rasmus Munk Larsen
c8a2b4d20a Fix typo in SelfAdjointEigenSolver_eigenvectors.cpp 2021-06-21 19:06:04 +00:00
Antonio Sanchez
e9ab4278b7 Rewrite balancer to avoid overflows.
The previous balancer overflowed for large row/column norms.
Modified to prevent that.

Fixes #2273.
2021-06-21 17:29:55 +00:00
Antonio Sanchez
35a367d557 Fix fix<> for gcc-4.9.3.
There's a missing `EIGEN_HAS_CXX14` -> `EIGEN_HAS_CXX14_VARIABLE_TEMPLATES`
replacement.

Fixes ##2267
2021-06-18 13:22:54 -07:00
Antonio Sanchez
12e8d57108 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.
2021-06-16 18:41:17 -07:00
Chip-Kerchner
ef1fd341a8 EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. 2021-06-16 16:30:31 +00:00
jenswehner
175f0cc1e9 changed documentation to make example compile 2021-06-16 11:45:06 +02:00
Antonio Sanchez
9e94c59570 Add missing ppc pcmp_lt_or_nan<Packet8bf> 2021-06-15 13:42:17 -07:00
Antonio Sanchez
954879183b Fix placement of permanent GPU defines. 2021-06-15 12:17:09 -07:00
Rasmus Munk Larsen
13fb5ab92c Fix more enum arithmetic. 2021-06-15 09:09:31 -07:00
Antonio Sanchez
ad82d20cf6 Fix checking of version number for mingw.
MinGW spits out version strings like: `x86_64-w64-mingw32-g++ (GCC)
10-win32 20210110`, which causes the version extraction to fail.
Added support for this with tests.

Also added `make_unsigned` for `long long`, since mingw seems to
use that for `uint64_t`.

Related to #2268.  CMake and build passes for me after this.
2021-06-11 23:19:10 +00:00
Antonio Sanchez
514977f31b Add ability to permanently enable HIP/CUDA gpu* defines.
When using Eigen for gpu, these simplify portability.  If
`EIGEN_PERMANENTLY_ENABLE_GPU_HIP_CUDA_DEFINES` is set, then
we do not undefine them.
2021-06-11 17:19:54 +00:00
Antonio Sanchez
6aec83263d Allow custom TENSOR_CONTRACTION_DISPATCH macro.
Currently TF lite needs to hack around with the Tensor headers in order
to customize the contraction dispatch method. Here we add simple `#ifndef`
guards to allow them to provide their own dispatch prior to inclusion.
2021-06-11 17:02:19 +00:00
Rasmus Munk Larsen
fc87e2cbaa Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. 2021-06-11 02:35:53 +00:00
Rasmus Munk Larsen
f64b2954c7 Fix c++20 warnings about using enums in arithmetic expressions. 2021-06-10 17:17:39 -07:00
Nicolas Cornu
001a57519a Fix parsing of version for nvhpc
As the first line of the version is empty it crashes,
so delete first line if it is empty
2021-06-10 18:30:53 +00:00
Rohit Santhanam
c8d40a7bf1 Removed dead code from GPU float16 unit test. 2021-05-28 20:06:48 +00:00
Cyril Kaiser
91cd67f057 Remove EIGEN_DEVICE_FUNC from CwiseBinaryOp's default copy constructor. 2021-05-26 19:28:13 +00:00
Antonio Sanchez
dba753a986 Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.
2021-05-25 18:25:35 +00:00
Antonio Sanchez
ebb300d0b4 Modify Unary/Binary/TernaryOp evaluators to work for non-class types.
This used to work for non-class types (e.g. raw function pointers) in
Eigen 3.3.  This was changed in commit 11f55b29 to optimize the
evaluator:

> `sizeof((A-B).cwiseAbs2())` with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.

though I cannot reproduce the 16 byte result.  Both before the change
and after, with multiple compilers/versions, I always get a result of 40 bytes.

https://godbolt.org/z/MsjTc1PGe

This change modifies the code slightly to allow non-class types.  The
final generated code is identical, and the expression remains 40 bytes
for the `abs2` sample case.

Fixes #2251
2021-05-23 12:44:37 -07:00
Jakub Lichman
12471fcb5d predux_half_dowto4 test extended to all applicable packets 2021-05-21 16:42:19 +00:00
Steve Bronder
1720057023 Adds macro for checking if C++14 variable templates are supported 2021-05-21 16:25:32 +00:00
Niall Murphy
391094c507 Use derived object type in conservative_resize_like_impl
When calling conservativeResize() on a matrix with DontAlign flag, the
temporary variable used to perform the resize should have the same
Options as the original matrix to ensure that the correct override of
swap is called (i.e. PlainObjectBase::swap(DenseBase<OtherDerived> &
other). Calling the base class swap (i.e in DenseBase) results in
assertions errors or memory corruption.
2021-05-20 23:17:02 +00:00
Jakub Lichman
8877f8d9b2 ptranpose test for non-square kernels added 2021-05-19 08:26:45 +00:00
Guoqiang QI
3e006bfd31 Ensure all generated matrices for inverse_4x4 testes are invertible, this fix #2248 . 2021-05-13 15:03:30 +00:00
Nathan Luehr
972cf0c28a Fix calls to device functions from host code 2021-05-11 22:47:49 +00:00
Nathan Luehr
7e6a1c129c Device implementation of log for std::complex types. 2021-05-11 22:02:21 +00:00
Nathan Luehr
6753f0f197 Fix ambiguity due to argument dependent lookup. 2021-05-11 15:41:11 -05:00
guoqiangqi
3d9051ea84 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 . 2021-05-10 23:53:16 +00:00
Rohit Santhanam
39ec31c0ad Fix for issue where numext::imag and numext::real are used before they are defined. 2021-05-10 19:48:32 +00:00
Antonio Sanchez
c0eb5f89a4 Restore ABI compatibility for conj with 3.3, fix conflict with boost.
The boost library unfortunately specializes `conj` for various types and
assumes the original two-template-parameter version.  This changes
restores the second parameter.  This also restores ABI compatibility.

The specialization for `std::complex` is because `std::conj` is not
a device function. For custom complex scalar types, users should provide
their own `conj` implementation.

We may consider removing the unnecessary second parameter in the future - but
this will require modifying boost as well.

Fixes #2112.
2021-05-07 18:14:00 +00:00
Antonio Sanchez
0eba8a1fe3 Clean up gpu device properties.
Made a class and singleton to encapsulate initialization and retrieval of
device properties.

Related to !481, which already changed the API to address a static
linkage issue.
2021-05-07 17:51:29 +00:00
Antonio Sanchez
90e9a33e1c Fix numext::arg return type.
The cxx11 path for `numext::arg` incorrectly returned the complex type
instead of the real type, leading to compile errors. Fixed this and
added tests.

Related to !477, which uncovered the issue.
2021-05-07 16:26:57 +00:00
Christoph Hertzberg
722ca0b665 Revert addition of unused paddsub<Packet2cf>. This fixes #2242 2021-05-06 18:36:47 +02:00
Antonio Sanchez
e3b7f59659 Simplify TensorRandom and remove time-dependence.
Time-dependence prevents tests from being repeatable. This has long
been an issue with debugging the tensor tests. Removing this will allow
future tests to be repeatable in the usual way.

Also, the recently added macros in !476 are causing headaches across different
platforms. For example, checking `_XOPEN_SOURCE` is leading to multiple
ambiguous macro errors across Google, and `_DEFAULT_SOURCE`/`_SVID_SOURCE`/`_BSD_SOURCE`
are sometimes defined with values, sometimes defined as empty, and sometimes
not defined at all when they probably should be.  This is leading to
multiple build breakages.

The simplest approach is to generate a seed via
`Eigen::internal::random<uint64_t>()` if on CPU. For GPU, we use a
hash based on the current thread ID (since `rand()` isn't supported
on GPU).

Fixes #1602.
2021-05-04 13:34:49 -07:00
Antonio Sanchez
1c013be2cc Better CUDA complex division.
The original produced NaNs when dividing 0/b for subnormal b.
The `complex_divide_stable` was changed to use the more common
Smith's algorithm.
2021-04-29 17:39:58 +00:00
Antonio Sanchez
172db7bfc3 Add missing pcmp_lt_or_nan for NEON Packet4bf. 2021-04-27 14:12:11 -07:00
Theo Fletcher
2ced0cc233 Added complex matrix unit tests for SelfAdjointEigenSolve 2021-04-26 19:00:51 +00:00
Jakub Lichman
d87648a6be Tests added and AVX512 bug fixed for pcmp_lt_or_nan 2021-04-25 20:58:56 +00:00
Jakub Lichman
1115f5462e Tests for pcmp_lt and pcmp_le added 2021-04-23 19:51:43 +00:00
Turing Eret
3804ca0d90 Fix for issue with static global variables in TensorDeviceGpu.h
m_deviceProperties and m_devicePropInitialized are defined as global
statics which will define multiple copies which can cause issues if
initializeDeviceProp() is called in one translation unit and then
m_deviceProperties is used in a different translation unit. Added
inline functions getDeviceProperties() and getDevicePropInitialized()
which defines those variables as static locals. As per the C++ standard
7.1.2/4, a static local declared in an inline function always refers
to the same object, so this should be safer. Credit to Sun Chenggen
for this fix.

This fixes issue #1475.
2021-04-23 07:43:35 -06:00
Antonio Sanchez
045c0609b5 Check existence of BSD random before use.
`TensorRandom` currently relies on BSD `random()`, which is not always
available.  The [linux manpage](https://man7.org/linux/man-pages/man3/srandom.3.html)
gives the glibc condition:
```
_XOPEN_SOURCE >= 500
               || /* Glibc since 2.19: */ _DEFAULT_SOURCE
	       || /* Glibc <= 2.19: */ _SVID_SOURCE ||  _BSD_SOURCE
```
In particular, this was failing to compile for MinGW via msys2. If not
available, we fall back to using `rand()`.
2021-04-22 20:42:12 +00:00
Antonio Sanchez
d213a0bcea DenseStorage safely copy/swap.
Fixes #2229.

For dynamic matrices with fixed-sized storage, only copy/swap
elements that have been set.  Otherwise, this leads to inefficient
copying, and potential UB for non-initialized elements.
2021-04-22 18:45:19 +00:00
Rasmus Munk Larsen
85a76a16ea Make vectorized compute_inverse_size4 compile with AVX. 2021-04-22 15:21:01 +00:00
Jakub Lichman
d72c794ccd Compilation of basicbenchmark fixed 2021-04-21 06:53:32 +00:00
Chip-Kerchner
06c2760bd1 Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings). 2021-04-21 00:47:13 +00:00
Jakub Lichman
2b1dfd1ba0 HasExp added for AVX512 Packet8d 2021-04-20 19:07:58 +00:00
Antonio Sanchez
1d79c68ba0 Fix ldexp for AVX512 (#2215)
Wrong shuffle was used.  Need to interleave low/high halves with a
`permute` instruction.

Fixes #2215.
2021-04-20 16:25:22 +00:00
David Tellenbach
3e819d83bf Before 3.4 branch 2021-04-18 23:36:14 +02:00
Antonio Sanchez
69adf26aa3 Modify googlehash use to account for namespace issues.
The namespace declaration for googlehash is a configurable macro that
can be disabled.  In particular, it is disabled within google, causing
compile errors since `dense_hash_map`/`sparse_hash_map` are then in
the global namespace instead of in `::google`.

Here we play a bit of gynastics to allow for both `google::*_hash_map`
and `*_hash_map`, while limiting namespace polution.  Symbols within
the `::google` namespace are imported into `Eigen::google`.

We also remove checks based on `_SPARSE_HASH_MAP_H_`, as this is
fragile, and instead require `EIGEN_GOOGLEHASH_SUPPORT` to be
defined.
2021-04-12 19:00:39 -07:00
Christoph Hertzberg
9357feedc7 Avoid using uninitialized inputs and if available, use slightly more efficient movsd instruction for pset1<Packet2cf>. 2021-04-13 01:36:59 +02:00
Rasmus Munk Larsen
a2c0542010 Fix typo in TensorDimensions.h 2021-04-12 18:59:56 +00:00
Rohit Santhanam
dfd6720d82 Fix for float16 GPU unit test. 2021-04-12 10:19:06 +00:00
Christoph Hertzberg
1e1c8a735c Use EIGEN_HAS_CXX11 and EIGEN_COMP_CXXVER macros to detect C++ version for std::result_of and std::invoke_result.
Fixes #2209
2021-04-12 01:26:15 +00:00
Jens Wehner
f6fc66aa75 fixed doxygen for unsupported iterative solver module 2021-04-11 16:26:14 +00:00
Christoph Hertzberg
d58678069c Make iterators default constructible and assignable, by making... 2021-04-09 17:03:28 +00:00
Rohit Santhanam
2859db0220 This fixes an issue where the compiler was not choosing the GPU specific specialization of ScanLauncher.
The issue was discovered when the GPU scan unit test was run and resulted in a segmentation fault.

The segmantation fault occurred because the unit test allocated GPU memory and passed a pointer to that memory to the computation that it presumed would execute on the GPU.

But because of the issue, the computation was scheduled to execute on the CPU so a situation was constructed where the CPU attempted to access a GPU memory location.

The fix expands the GPU specific ScanLauncher specialization to handle cases where vectorization is enabled.

Previously, the GPU specialization is chosen only if Vectorization is not used.
2021-04-08 15:14:48 +00:00
Antonio Sanchez
fcb5106c6e Scaled epsilon the wrong way.
Should have been 0.5 to widen the bounds, since this is inverse
precision.  Setting to 0.5, however, leads to many more failing
tests at Google, so reverting to 1 for now.
2021-04-07 15:08:39 -07:00
Christoph Hertzberg
6197ce1a35 Replace -2147483648 by -0.0f or -0.0 constants (this should fix #2189).
Also, remove unnecessary `pgather` operations.
2021-04-07 11:25:27 +00:00
Rasmus Munk Larsen
22edb46823 Align local arrays to Packet boundary. 2021-04-06 16:22:36 +00:00
Antonio Sanchez
ace7f132ed Fix clang tidy warnings in AnnoyingScalar.
Clang-tidy complains that full specializations in headers can cause ODR
violations.  Marked these as `inline` to fix.

It also complains about renaming arguments in specializations.  Set the
argument names to match.
2021-04-05 12:49:38 -07:00
Antonio Sanchez
90187a33e1 Fix SelfAdjoingEigenSolver (#2191)
Adjust the relaxation step to use the condition
```
abs(subdiag[i]) <= epsilon * sqrt(abs(diag[i]) + abs(diag[i+1]))
```
for setting the subdiagonal entry to zero.

Also adjust Wilkinson shift for small `e = subdiag[end-1]` -
I couldn't find a reference for the original, and it was not
consistent with the Wilkinson definition.

Fixes #2191.
2021-04-05 11:19:09 -07:00
Rasmus Munk Larsen
3ddc0974ce Fix two bugs in commit 2021-04-02 22:06:27 +00:00
Chip Kerchner
c24bee6120 Fix address of temporary object errors in clang11.
This fixes the problem with taking the address of temporary objects which clang11 treats as errors.
2021-04-02 16:27:08 +00:00
David Tellenbach
e4233b6e3d Add CI infrastructure for pre-merge smoke tests.
This patch adds pre-merge smoke tests for x86 Linux using gcc-10 and clang-10.

Closes #2188.
2021-04-01 00:08:37 +00:00
David Tellenbach
ae95b74af9 Add CMake infrastructure for smoke testing
Necessary CMake changes to implement pre-merge smoke tests
running via CI.
2021-03-31 22:09:00 +00:00
Rasmus Munk Larsen
5bbc9cea93 Add an info() method to the SVDBase class to make it possible to tell the user that the computation failed, possibly due to invalid input.
Make Jacobi and divide-and-conquer fail fast and return info() == InvalidInput if the matrix contains NaN or +/-Inf.
2021-03-31 21:09:19 +00:00
Guoqiang QI
b5a926a0f6 Add GitLab templates for issues and merge requests
This patch adds GitLab templates for bug reports, feature and merge requests.

This closes #2117.
2021-03-31 16:01:12 +00:00
Antonio Sanchez
78ee3d6261 Fix CUDA constexpr issues for numeric_limits.
Some CUDA/HIP constants fail on device with `constexpr` since they
internally rely on non-constexpr functions, e.g.
```
\#define CUDART_INF_F            __int_as_float(0x7f800000)
```
This fails for cuda-clang (though passes with nvcc). These constants are
currently used by `device::numeric_limits`.  For portability, we
need to remove `constexpr` from the affected functions.

For C++11 or higher, we should be able to rely on the `std::numeric_limits`
versions anyways, since the methods themselves are now `constexpr`, so
should be supported on device (clang/hipcc natively, nvcc with
`--expr-relaxed-constexpr`).
2021-03-30 18:01:27 +00:00
Antonio Sanchez
af1247fbc1 Use Index type in loop over coefficients.
Previously was `int`.  Brought up by Kyle Snow (Polaris Geospatial
Services) on the mailing list.
2021-03-29 17:40:55 +00:00
Antonio Sanchez
87729ea39f Eliminate round_impl double-promotion warnings for c++03. 2021-03-25 16:52:19 +00:00
Deven Desai
748489ef9c Un-defining EIGEN_HAS_CONSTEXPR on the HIP platform
The Eigen unit-tests started failing on the HIP/ROCm platform, after the following commit

e7b8643d70

```
In file included from /home/rocm-user/eigen/test/main.h:360:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:162:
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:300:17: error: constexpr function never produces a constant expression [-Winvalid-constexpr]
  static float (max)() {
                ^
/home/rocm-user/eigen/Eigen/src/Core/util/Meta.h:304:12: note: non-constexpr function '__int_as_float' cannot be used in a constant expression
    return HIPRT_MAX_NORMAL_F;
           ^
/home/rocm-user/eigen/Eigen/src/Core/arch/HIP/hcc/math_constants.h:14:28: note: expanded from macro 'HIPRT_MAX_NORMAL_F'
#define HIPRT_MAX_NORMAL_F __int_as_float(0x7f7fffff)
                           ^
/opt/rocm/hip/include/hip/hcc_detail/device_functions.h:913:32: note: declared here
__device__ static inline float __int_as_float(int x) {
                               ^
```

The problem seems to that some of the constants defined in the HIP `math_constants.h` have a call to `__int_as_float` routine which is not declared `constexpr` in the HIP runtime header file.

Working around this issue for now, be skipping the const_expr support (enabled via the above commit) on HIP
2021-03-25 13:45:52 +00:00
Chip Kerchner
d59ef212e1 Fixed performance issues for complex VSX and P10 MMA in gebp_kernel (level 3). 2021-03-25 11:08:19 +00:00
Steve Bronder
e7b8643d70 Revert "Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()""
This reverts commit 5f0b4a4010.
2021-03-24 18:14:56 +00:00
Antonio Sanchez
5521c65afb Eliminate mixingtypes_7 warning.
`g_called` is not used in subtest 7, so was generating a
`-Wunneeded-internal-declaration` warnings.  Here we silence
it by initializing the static variable.
2021-03-24 11:05:41 -07:00
Christoph Hertzberg
69a4f70956 Revert "Uses _mm512_abs_pd for Packet8d pabs"
This reverts commit f019b97aca
2021-03-23 18:52:19 +00:00
David Tellenbach
824272cde8 Re-enable CI for Power 2021-03-22 19:28:25 +01:00
David Tellenbach
4811e81966 Remove yet another comma at end of enum 2021-03-18 23:30:00 +01:00
Steve Bronder
f019b97aca Uses _mm512_abs_pd for Packet8d pabs 2021-03-18 15:47:52 +00:00
David Tellenbach
0cc9b5eb40 Split test commainitializer into two substests 2021-03-18 13:28:51 +01:00
Antonio Sanchez
c3fbc6cec7 Use singleton pattern for static registered tests.
The original fails with nvcc+msvc - there's a static order of initialization
issue leading to registered tests being cleared.  The test then fails on
```
VERIFY(EigenTest::all().size()>0);
```
since `EigenTest` no longer contains any tests.  The singleton pattern
fixes this.
2021-03-18 00:56:31 +00:00
Niek Bouman
ed964ba3f1 Proposed fix for issue #2187 2021-03-18 00:55:36 +00:00
Antonio Sanchez
8dfe1029a5 Augment NumTraits with min/max_exponent() again.
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase where possible.  Also replaced some other `numeric_limits`
usages in affected tests with the `NumTraits` equivalent.

The previous MR !443 failed for c++03 due to lack of `constexpr`.
Because of this, we need to keep around the `std::numeric_limits`
version in enum expressions until the switch to c++11.

Fixes #2148
2021-03-16 20:12:46 -07:00
David Tellenbach
eb71e5db98 Fix another warning on missing commas 2021-03-17 03:07:04 +01:00
David Tellenbach
df4bc2731c Revert "Augment NumTraits with min/max_exponent()."
This reverts commit 75ce9cd2a7.
2021-03-17 03:06:08 +01:00
Antonio Sanchez
75ce9cd2a7 Augment NumTraits with min/max_exponent().
Replace usage of `std::numeric_limits<...>::min/max_exponent` in
codebase.  Also replaced some other `numeric_limits` usages in
affected tests with the `NumTraits` equivalent.

Fixes #2148
2021-03-17 01:00:41 +00:00
David Tellenbach
9fb7062440 Silence warning on comma at end of enumerator list 2021-03-17 01:46:52 +01:00
Theo Fletcher
b8502a9dd6 Updated SelfAdjointEigenSolver documentation to include that the eigenvectors matrix is unitary. 2021-03-16 18:48:02 +00:00
Rasmus Munk Larsen
2e83cbbba9 Add NaN propagation options to minCoeff/maxCoeff visitors. 2021-03-16 17:02:50 +00:00
Jens Wehner
c0a889890f Fixed output of complex matrices 2021-03-15 21:51:55 +00:00
Antonio Sanchez
f612df2736 Add fmod(half, half).
This is to support TensorFlow's `tf.math.floormod` for half.
2021-03-15 13:32:24 -07:00
Antonio Sanchez
14b7ebea11 Fix numext::round pre c++11 for large inputs.
This is to resolve an issue for large inputs when +0.5 can
actually lead to +1 if the input doesn't have enough precision
to resolve the addition - leading to an off-by-one error.

See discussion on 9a663973.
2021-03-15 19:08:04 +00:00
Chip Kerchner
c9d4367fa4 Fix pround and add print 2021-03-15 19:07:43 +00:00
Antonio Sanchez
d24f9f9b55 Fix NVCC+ICC issues.
NVCC does not understand `__forceinline`, so we need to use `inline`
when compiling for GPU.

ICC specializes `std::complex` operators for `float` and `double`
by default, which cannot be used on device and conflict with Eigen's
workaround in CUDA/Complex.h.  This can be prevented by defining
`_OVERRIDE_COMPLEX_SPECIALIZATION_` before including `<complex>`.
Added this define to the tests and to `Eigen/Core`, but this will
not work if the user includes `<complex>` before `<Eigen/Core>`.

ICC also seems to generate a duplicate `Map` symbol in
`PlainObjectBase`:
```
error: "Map" has already been declared in the current scope
  static ConstMapType Map(const Scalar *data)

```
I tracked this down to `friend class Eigen::Map`.  Putting the `friend`
statements at the bottom of the class seems to resolve this issue.

Fixes #2180
2021-03-15 18:42:04 +00:00
Antonio Sanchez
14487ed14e Add increment/decrement operators to Eigen::half.
This is for consistency with bfloat16, and to support initialization
with `std::iota`.
2021-03-15 10:52:23 -07:00
Antonio Sanchez
b271110788 Bump up rand histogram threshold.
The previous one sometimes fails for MSVC which has a poor random
number generator.

Fixes #2182
2021-03-10 22:17:03 -08:00
Antonio Sanchez
d098c4d64c Disable EIGEN_OPTIMIZATION_BARRIER for PPC clang.
Doesn't seem to correctly select the register type, and most types
lead to compiler crashes.
2021-03-10 16:05:01 -08:00
Antonio Sanchez
543e34ab9d Re-implement move assignments.
The original swap approach leads to potential undefined behavior (reading
uninitialized memory) and results in unnecessary copying of data for static
storage.

Here we pass down the move assignment to the underlying storage.  Static
storage does a one-way copy, dynamic storage does a swap.

Modified the tests to no longer read from the moved-from matrix/tensor,
since that can lead to UB. Added a test to ensure we do not access
uninitialized memory in a move.

Fixes: #2119
2021-03-10 16:55:20 +00:00
Ben Niu
b8d1857f0d [MSVC-specific] Define EIGEN_ARCH_x86_64 for native x64 (_M_X64 is defined and _M_ARM64EC is not), and define EIGEN_ARCH_ARM64 for both the native ARM64 (_M_ARM64 is defined) or ARM64EC (_M_ARM64EC is defined). _M_ARM64EC is defined when the code is compiled by MSVC for ARM64EC, a new ARM64 ABI designed to be compatible with x64 application emulation on ARM64. If _M_ARM64EC is defined, _M_X64 and _M_AMD64 are also defined, so x64-specific code (especially intrinsics) is also compiled to ARM64 instructions (compliant with the ARM64EC ABI) for maximum x64 compatibility. Although a majority of x64-specific intrinsics can emulated by ARM64 instructions, it is still a good to simply recompile the native ARM64 code paths to ARM64EC for pure computation tasks, for performance reasons. 2021-03-10 10:21:31 +00:00
Antonio Sanchez
853a5c4b84 Fix ambiguous call to CUDA __half constructor. 2021-03-08 21:06:28 -08:00
Antonio Sanchez
94327dbfba Fix typo: DEVICE -> GPU 2021-03-08 11:21:00 -08:00
Antonio Sanchez
1296abdf82 Fix non-trivial Half constructor for CUDA.
Both CUDA and HIP require trivial default constructors for types used
in shared memory. Otherwise failing with
```
error: initialization is not supported for __shared__ variables.
```
2021-03-08 07:32:54 -08:00
Antonio Sanchez
6045243141 Revert stack allocation limit change that crept in.
This was accidentally introduced when copying changes between repos.
2021-03-05 14:29:37 -08:00
Deven Desai
1a96d49afe Changing the Eigen::half implementation for HIP
Currently, when compiling with HIP, Eigen::half is derived from the `__half_raw` struct that is defined within the hip_fp16.h header file. This is true for both the "host" compile phase and the "device" compile phase. This was causing a very hard to detect bug in the ROCm TensorFlow build.

In the ROCm Tensorflow build,
* files that do not contain ant GPU code get compiled via gcc, and
* files that contnain GPU code get compiled via hipcc.

In certain case, we have a function that is defined in a file that is compiled by hipcc, and is called in a file that is compiled by gcc. If such a function had Eigen::half has a "pass-by-value" argument, its value was getting corrupted, when received by the function.

The reason for this seems to be that for the gcc compile, Eigen::half is derived from a `__half_raw` struct that has `uint16_t` as the data-store, and for hipcc the `__half_raw` implementation uses `_Float16` as the data store. There is some ABI incompatibility between gcc / hipcc (which is essentially latest clang), which results in the Eigen::half value (which is correct at the call-site) getting randomly corrupted when passed to the function.

Changing the Eigen::half argument to be "pass by reference" seems to workaround the error.

In order to fix it such that we do not run into it again in TF, this commit changes the Eigne::half implementation to use the same `__half_raw` implementation as the non-GPU compile, during host compile phase of the hipcc compile.
2021-03-05 19:27:13 +00:00
Antonio Sanchez
2468253c9a Define EIGEN_CPLUSPLUS and replace most __cplusplus checks.
The macro `__cplusplus` is not defined correctly in MSVC unless building
with the the `/Zc:__cplusplus` flag. Instead, it defines `_MSVC_LANG` to the
specified c++ standard version number.

Here we introduce `EIGEN_CPLUSPLUS` which will contain the c++ version
number both for MSVC and otherwise.  This simplifies checks for supported
features.

Also replaced most instances of standard version checking via `__cplusplus`
with the existing `EIGEN_COMP_CXXVER` macro for better clarity.

Fixes: #2170
2021-03-05 18:33:18 +00:00
Antonio Sanchez
82d61af3a4 Fix rint SSE/NEON again, using optimization barrier.
This is a new version of !423, which failed for MSVC.

Defined `EIGEN_OPTIMIZATION_BARRIER(X)` that uses inline assembly to
prevent operations involving `X` from crossing that barrier. Should
work on most `GNUC` compatible compilers (MSVC doesn't seem to need
this). This is a modified version adapted from what was used in
`psincos_float` and tested on more platforms
(see #1674, https://godbolt.org/z/73ezTG).

Modified `rint` to use the barrier to prevent the add/subtract rounding
trick from being optimized away.

Also fixed an edge case for large inputs that get bumped up a power of two
and ends up rounding away more than just the fractional part.  If we are
over `2^digits` then just return the input.  This edge case was missed in
the test since the test was comparing approximate equality, which was still
satisfied.  Adding a strict equality option catches it.
2021-03-05 08:54:12 -08:00
David Tellenbach
5f0b4a4010 Revert "Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size()"
This reverts commit 6cbb3038ac because it
breaks clang-10 builds on x86 and aarch64 when C++11 is enabled.
2021-03-05 13:16:43 +01:00
Steve Bronder
6cbb3038ac Adds EIGEN_CONSTEXPR and EIGEN_NOEXCEPT to rows(), cols(), innerStride(), outerStride(), and size() 2021-03-04 18:58:08 +00:00
David Tellenbach
5bfc67f9e7 Deactive CI for Power due to problems with GitLab runner 2021-03-04 17:33:40 +01:00
Eugene Zhulenev
a6601070f2 Add log2 operation to TensorBase 2021-03-04 00:13:36 +00:00
Antonio Sánchez
9a663973b4 Revert "Fix rint for SSE/NEON."
This reverts commit e72dfeb8b9
2021-03-03 18:51:51 +00:00
Antonio Sanchez
e72dfeb8b9 Fix rint for SSE/NEON.
It seems *sometimes* with aggressive optimizations the combination
`psub(padd(a, b), b)` trick to force rounding is compiled away. Here
we replace with inline assembly to prevent this (I tried `volatile`,
but that leads to additional loads from memory).

Also fixed an edge case for large inputs `a` where adding `b` bumps
the value up a power of two and ends up rounding away more than
just the fractional part.  If we are over `2^digits` then just return
the input.  This edge case was missed in the test since the test was
comparing approximate equality, which was still satisfied.  Adding
a strict equality option catches it.
2021-03-03 09:41:46 -08:00
Christoph Hertzberg
199c5f2b47 geo_alignedbox_5 was failing with AVX enabled, due to storing Vector4d in a std::vector without using an aligned allocator.
Got rid of using `std::vector` and simplified the code.
Avoid leading `_`
2021-03-01 03:59:21 +01:00
Antonio Sanchez
1e0c7d4f49 Add print for SSE/NEON, use NEON rounding intrinsics if available.
In SSE, by adding/subtracting 2^MantissaBits, we force rounding according to the
current rounding mode.

For NEON, we use the provided intrinsics for rint/floor/ceil if
available (armv8).

Related to #1969.
2021-02-27 22:42:07 +00:00
David Tellenbach
976ae0ca6f Document that using raw function pointers doesn't work with unaryExpr. 2021-02-27 22:58:42 +01:00
Antonio Sanchez
c65c2b31d4 Make half/bfloat16 constructor take inputs by value, fix powerpc test.
Since `numeric_limits<half>::max_exponent` is a static inline constant,
it cannot be directly passed by reference. This triggers a linker error
in recent versions of `g++-powerpc64le`.

Changing `half` to take inputs by value fixes this.  Wrapping
`max_exponent` with `int(...)` to make an addressable integer also fixes this
and may help with other custom `Scalar` types down-the-road.

Also eliminated some compile warnings for powerpc.
2021-02-27 21:32:06 +00:00
Christoph Hertzberg
39a590dfb6 Remove unused include 2021-02-27 19:02:33 +01:00
Christoph Hertzberg
8f686ac4ec clang 10 aggressively warns about precision loss when converting int to float (or long to double)
(cherry picked from commit cd541ad52c8152340469cae210312c0e27829c8d)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
2660d01fa7 Inherit from no_assignment_operator to avoid implicit copy constructor warnings
(cherry picked from commit 9bbb7ea4b54b1f307863be4ed8d105c38cdefe50)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
a3521d743c Fix some enum-enum conversion warnings
(cherry picked from commit 838f3d8ce22a5549ef10c7386fb03040721749a0)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
ca528593f4 Fixed/masked more implicit copy constructor warnings
(cherry picked from commit 2883e91ce5a99c391fbf28e20160176b70854992)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
81b5fe2f0a ReturnByValue is already non-copyable
(cherry picked from commit abbf95045009619f37bd92b45433eedbfcbe41cf)
2021-02-27 18:44:26 +01:00
Christoph Hertzberg
4fb3459a23 Fix double-promotion warnings
(cherry picked from commit c22c103e932e511e96645186831363585a44b7a3)
2021-02-27 18:44:26 +01:00
Jens Wehner
4bfcee47b9 Idrs iterative linear solver 2021-02-27 12:09:33 +00:00
Antonio Sanchez
29ebd84cb7 Fix NEON sqrt for 32-bit, add prsqrt.
With !406, we accidentally broke arm 32-bit NEON builds, since
`vsqrt_f32` is only available for 64-bit.

Here we add back the `rsqrt` implementation for 32-bit, relying
on a `prsqrt` implementation with better handling of edge cases.

Note that several of the 32-bit NEON packet tests are currently
failing - either due to denormal handling (NEON versions flush
to zero, but scalar paths don't) or due to accuracy (e.g. sin/cos).
2021-02-26 14:08:40 -08:00
Rasmus Munk Larsen
fe19714f80 Merge branch 'rmlarsen1/eigen-nan_prop' 2021-02-26 09:21:24 -08:00
Rasmus Munk Larsen
e67672024d Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-26 09:12:44 -08:00
Rasmus Munk Larsen
5e7d4c33d6 Add TODO. 2021-02-26 09:08:45 -08:00
Rasmus Munk Larsen
fb5b59641a Defer default for minCoeff/maxCoeff to templated variant. 2021-02-26 09:07:00 -08:00
Antonio Sanchez
e19829c3b0 Fix floor/ceil for NEON fp16.
Forgot to test this.  Fixes bug introduced in !416.
2021-02-25 20:39:56 -08:00
Antonio Sanchez
5529db7524 Fix SSE/NEON pfloor/pceil for saturated values.
The original will saturate if the input does not fit into an integer
type.  Here we fix this, returning the input if it doesn't have
enough precision to have a fractional part.

Also added `pceil` for NEON.

Fixes #1969.
2021-02-25 14:39:26 -08:00
Rasmus Munk Larsen
51eba8c3e2 Fix indentation. 2021-02-25 18:21:21 +00:00
Rasmus Munk Larsen
5297b7162a Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 18:21:21 +00:00
Antonio Sanchez
ecb7b19dfa Disable new/delete test for HIP 2021-02-25 08:04:05 -08:00
Chip-Kerchner
6eebe97bab Fix clang compile when no MMA flags are set. Simplify MMA compiler detection. 2021-02-24 20:43:23 -06:00
Rasmus Munk Larsen
f284c8592b Don't crash when attempting to slice an empty tensor. 2021-02-24 18:12:51 -08:00
Rasmus Munk Larsen
4cb0592af7 Fix indentation. 2021-02-24 17:59:36 -08:00
Rasmus Munk Larsen
6b34568c74 Merge branch 'nan_prop' of https://gitlab.com/rmlarsen1/eigen into nan_prop 2021-02-24 17:54:58 -08:00
Rasmus Munk Larsen
0065f9d322 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-25 01:54:36 +00:00
Rasmus Munk Larsen
841c8986f8 Make it possible to specify NaN propagation strategy for maxCoeff/minCoeff reductions. 2021-02-24 17:49:20 -08:00
Rasmus Munk Larsen
113e61f364 Remove unused function scalar_cmp_with_cast. 2021-02-24 23:59:35 +00:00
Rasmus Munk Larsen
98ca58b02c Cast anonymous enums to int when used in expressions. 2021-02-24 23:50:06 +00:00
Chip-Kerchner
c31ead8a15 Having forward template function declarations in a P10 file causes bad code in certain situations. 2021-02-24 23:43:30 +00:00
Guoqiang QI
f44197fabd Some improvements for kissfft from Martin Reinecke(pocketfft author):
1.Only computing about half of the factors and use complex conjugate symmetry for the rest instead of all to save time.
2.All twiddles are calculated in double because that gives the maximum achievable precision when doing float transforms.
3.Reducing all angles to the range 0<angle<pi/4 which gives even more precision.
2021-02-24 21:36:47 +00:00
Antonio Sanchez
a31effc3bc Add invoke_result and eliminate result_of warnings for C++17+.
The `std::result_of` meta struct is deprecated in C++17 and removed
in C++20. It was still slipping through due to a faulty definition of
`EIGEN_HAS_STD_RESULT_OF`.

Added a new macro `EIGEN_HAS_STD_INVOKE_RESULT` and
`Eigen::internal::invoke_result` implementation with fallback for
pre C++17.

Replaces the `result_of` definition with one based on `std::invoke_result`
for C++17 and higher.

For completeness, added nullary op support for c++03.

Fixes #1850.
2021-02-24 21:36:14 +00:00
Chip-Kerchner
8523d447a1 Fixes to support old and new versions of the compilers for built-ins. Cast to non-const when using vector_pair with certain built-ins. 2021-02-24 20:49:15 +00:00
Antonio Sanchez
5908aeeaba Fix CUDA device new and delete, and add test.
HIP does not support new/delete on device, so test is skipped.
2021-02-24 11:31:41 -08:00
Antonio Sanchez
119763cf38 Eliminate CMake FindPackageHandleStandardArgs warnings.
CMake complains that the package name does not match when the case
differs, e.g.:
```
CMake Warning (dev) at /usr/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
  The package name passed to `find_package_handle_standard_args` (UMFPACK)
  does not match the name of the calling package (Umfpack).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  cmake/FindUmfpack.cmake:50 (find_package_handle_standard_args)
  bench/spbench/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.
```
Here we rename the libraries to match their true cases.
2021-02-24 09:52:05 +00:00
Antonio Sanchez
6cf0ab5e99 Disable fast psqrt for NEON.
Accuracy is too poor - requires at least two Newton iterations, but then
it is no longer significantly faster than `vsqrt`.

Fixes #2094.
2021-02-23 19:52:55 -08:00
Antonio Sanchez
aba3998278 Fix check if GPU compile phase for std::hash 2021-02-23 19:52:08 -08:00
Antonio Sanchez
db5691ff2b Fix some CUDA warnings.
Added `EIGEN_HAS_STD_HASH` macro, checking for C++11 support and not
running on GPU.

`std::hash<float>` is not a device function, so cannot be used by
`std::hash<bfloat16>`.  Removed `EIGEN_DEVICE_FUNC` and only
define if `EIGEN_HAS_STD_HASH`. Same for `half`.

Added `EIGEN_CUDA_HAS_FP16_ARITHMETIC` to improve readability,
eliminate warnings about `EIGEN_CUDA_ARCH` not being defined.

Replaced a couple C-style casts with `reinterpret_cast` for aligned
loading of `half*` to `half2*`. This eliminates `-Wcast-align`
warnings in clang.  Although not ideal due to potential type aliasing,
this is how CUDA handles these conversions internally.
2021-02-24 00:16:31 +00:00
Rasmus Munk Larsen
88d4c6d4c8 Accurate pow, part 2. This change adds specializations of log2 and exp2 for double that
make pow<double> accurate the 1 ULP. Speed for AVX-512 is within 0.5% of the currect
implementation.
2021-02-23 23:11:03 +00:00
Adam Shapiro
2ac0b78739 Fixed sparse conservativeResize() when both num cols and rows decreased.
The previous implementation caused a buffer overflow trying to calculate non-
zero counts for columns that no longer exist.
2021-02-23 21:32:39 +00:00
Chip-Kerchner
10c77b0ff4 Fix compilation errors with later versions of GCC and use of MMA. 2021-02-22 15:01:47 -06:00
Christoph Hertzberg
73922b0174 Fixes Bug #1925. Packets should be passed by const reference, even to inline functions. 2021-02-20 18:56:42 +01:00
Antonio Sanchez
5f9cfb2529 Add missing adolc isinf/isnan.
Also modified cmake/FindAdolc.cmake to eliminate warnings, and added
search paths to match install layout.

Fixed: #2157
2021-02-19 22:26:56 +00:00
Christoph Hertzberg
ce4af0b38f Missing change regarding #1910 2021-02-19 20:51:35 +01:00
Christoph Hertzberg
a7749c09bc Bug #1910: Make SparseCholesky work for RowMajor matrices 2021-02-19 19:36:18 +01:00
Antonio Sánchez
128eebf05e Revert "add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC)."
This reverts commit 12fd3dd655
2021-02-19 17:09:16 +00:00
frgossen
33e0af0130 Return nan at poles of polygamma, digamma, and zeta if limit is not defined 2021-02-19 16:35:11 +00:00
Rasmus Munk Larsen
7f09d3487d Use the Cephes double subtraction trick in pexp<float> even when FMA is available. Otherwise the accuracy drops from 1 ulp to 3 ulp. 2021-02-18 20:49:18 +00:00
Masaki Murooka
12fd3dd655 add EIGEN_DEVICE_FUNC to EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF macros (only if not HIPCC). 2021-02-17 22:55:47 +00:00
David Tellenbach
aa8b22e776 Bump to 3.4.99 2021-02-17 23:23:17 +01:00
David Tellenbach
5336ad8591 Define internal::make_unsigned for [unsigned]long long on macOS.
macOS defines int64_t as long long even for C++03 and therefore expects
a template specialization

  internal::make_unsigned<long long>,

for C++03. Since other platforms define int64_t as long for C++03 we
cannot add the specialization for all cases.
2021-02-17 23:03:10 +01:00
Antonio Sanchez
0845df7f77 Fix uninitialized warning on AVX. 2021-02-17 13:13:39 -08:00
Chip Kerchner
9b51dc7972 Fixed performance issues for VSX and P10 MMA in general_matrix_matrix_product 2021-02-17 17:49:23 +00:00
Rasmus Munk Larsen
be0574e215 New accurate algorithm for pow(x,y). This version is accurate to 1.4 ulps for float, while still being 10x faster than std::pow for AVX512. A future change will introduce a specialization for double. 2021-02-17 02:50:32 +00:00
Antonio Sanchez
7ff0b7a980 Updated pfrexp implementation.
The original implementation fails for 0, denormals, inf, and NaN.

See #2150
2021-02-17 02:23:24 +00:00
David Tellenbach
9ad4096ccb Document possible inconsistencies when using Matrix<bool, ...> 2021-02-17 00:50:26 +01:00
Ashutosh Sharma
f702792a7c missing method in packetmath.h void ptranspose(PacketBlock<Packet16uc, 4>& kernel) 2021-02-16 16:33:59 +00:00
Jan van Dijk
db61b8d478 Avoid -Wunused warnings in NDEBUG builds.
In two places in SuperLUSupport.h, a local variable 'size' is
created that is used only inside an eigen_assert. Remove these,
just fetch the required values inside the assert statements.
This avoids annoying -Wunused warnings (and -Werror=unused errors)
in NDEBUG builds.
2021-02-12 18:35:35 +00:00
David Tellenbach
622c598944 Don't allow all test jobs to fail but only the currently failing ones. 2021-02-12 14:01:17 +01:00
Antonio Sanchez
90ee821c56 Use vrsqrts for rsqrt Newton iterations.
It's slightly faster and slightly more accurate, allowing our current
packetmath tests to pass for sqrt with a single iteration.
2021-02-11 11:33:51 -08:00
Antonio Sanchez
9fde9cce5d Adjust bounds for pexp_float/double
The original clamping bounds on `_x` actually produce finite values:
```
  exp(88.3762626647950) = 2.40614e+38 < 3.40282e+38

  exp(709.437) = 1.27226e+308 < 1.79769e+308
```
so with an accurate `ldexp` implementation, `pexp` fails for large
inputs, producing finite values instead of `inf`.

This adjusts the bounds slightly outside the finite range so that
the output will overflow to +/- `inf` as expected.
2021-02-10 22:48:05 +00:00
Antonio Sanchez
4cb563a01e Fix ldexp implementations.
The previous implementations produced garbage values if the exponent did
not fit within the exponent bits.  See #2131 for a complete discussion,
and !375 for other possible implementations.

Here we implement the 4-factor version. See `pldexp_impl` in
`GenericPacketMathFunctions.h` for a full description.

The SSE `pcmp*` methods were moved down since `pcmp_le<Packet4i>`
requires `por`.

Left as a "TODO" is to delegate to a faster version if we know the
exponent does fit within the exponent bits.

Fixes #2131.
2021-02-10 22:45:41 +00:00
Ashutosh Sharma
7eb07da538 loop less ptranspose 2021-02-10 10:21:37 -08:00
David Tellenbach
36200b7855 Remove vim specific comments to recognoize correct file-type.
As discussed in #2143 we remove editor specific comments.
2021-02-09 09:13:09 +01:00
David Tellenbach
54589635ad Replace nullptr by NULL in SparseLU.h to be C++03 compliant. 2021-02-09 09:08:06 +01:00
Ralf Hannemann-Tamas
984d010b7b add specialization of check_sparse_solving() for SuperLU solver, in order to test adjoint and transpose solves 2021-02-08 22:00:31 +00:00
Nikolaus Demmel
b578930657 Fix documentation typos in LDLT.h 2021-02-08 21:07:29 +00:00
Antonio Sanchez
66841ea070 Enable bdcsvd on host.
Currently if compiled by NVCC, the `MatrixBase::bdcSvd()` implementation
is skipped, leading to a linker error.  This prevents it from running on
the host as well.

Seems it was disabled 6 years ago (5384e891) to match `jacobiSvd`, but
`jacobiSvd` is now enabled on host.  Tested and runs fine on host, but
will not compile/run for device (though it's not labelled as a device
function, so this should be fine).

Fixes #2139
2021-02-08 12:56:23 -08:00
Rasmus Munk Larsen
6e3b795f81 Add more tests for pow and fix a corner case for huge exponent where the result is always zero or infinite unless x is one. 2021-02-05 16:58:49 -08:00
Antonio Sanchez
abcde69a79 Disable vectorized pow for half/bfloat16.
We are potentially seeing some accuracy issues with these.  Ideally we
would hand off to `float`, but that's not trivial with the current
setup.

We may want to consider adding `ppow<Packet>` and `HasPow`, so
implementations can more easily specialize this.
2021-02-05 12:17:34 -08:00
Antonio Sanchez
f85038b7f3 Fix excessive GEBP register spilling for 32-bit NEON.
Clang does a poor job of optimizing the GEBP microkernel on 32-bit ARM,
leading to excessive 16-byte register spills, slowing down basic f32
matrix multiplication by approx 50%.

By specializing `gebp_traits`, we can eliminate the register spills.
Volatile inline ASM both acts as a barrier to prevent reordering and
enforces strict register use. In a simple f32 matrix multiply example,
this modification reduces 16-byte spills from 109 instances to zero,
leading to a 1.5x speed increase (search for `16-byte Spill` in the
assembly in https://godbolt.org/z/chsPbE).

This is a replacement of !379.  See there for further discussion.

Also moved `gebp_traits` specializations for NEON to
`Eigen/src/Core/arch/NEON/GeneralBlockPanelKernel.h` to be alongside
other NEON-specific code.

Fixes #2138.
2021-02-03 09:01:48 -08:00
Antonio Sanchez
56c8b14d87 Eliminate implicit conversions from float to double. 2021-02-01 15:31:01 -08:00
Antonio Sanchez
fb4548e27b Implement bit_* for device.
Unfortunately `std::bit_and` and the like are host-only functions prior
to c++14 (since they are not `constexpr`).  They also never exist in the
global namespace, so the current implementation  always fails to compile via
NVCC - since `EIGEN_USING_STD` tries to import the symbol from the global
namespace on device.

To overcome these limitations, we implement these functionals here.
2021-02-01 13:27:45 -08:00
Antonio Sanchez
1615a27993 Fix altivec packetmath.
Allows the altivec packetmath tests to pass.  There were a few issues:
- `pstoreu` was missing MSQ on `_BIG_ENDIAN` systems
- `cmp_*` didn't properly handle conversion of bool flags (0x7FC instead
of 0xFFFF)
- `pfrexp` needed to set the `exponent` argument.

Related to !370, #2128

cc: @ChipKerchner @pdrocaldeira

Tested on `_BIG_ENDIAN` running on QEMU with VSX.  Couldn't figure out build
flags to get it to work for little endian.
2021-01-28 18:37:09 +00:00
Chip Kerchner
1414e2212c Fix clang compilation for AltiVec from previous check-in 2021-01-28 18:36:40 +00:00
David Tellenbach
170a504c2f Add the following functions
DenseBase::setConstant(NoChange_t, Index, const Scalar&)
  DenseBase::setConstant(Index, NoChange_t, const Scalar&)

to close #663.
2021-01-28 15:13:07 +01:00
David Tellenbach
598e1b6e54 Add the following functions:
DenseBase::setZero(NoChange_t, Index)
  DenseBase::setZero(Index, NoChange_t)
  DenseBase::setOnes(NoChange_t, Index)
  DenseBase::setOnes(Index, NoChange_t)
  DenseBase::setRandom(NoChange_t, Index)
  DenseBase::setRandom(Index, NoChange_t)

This closes #663.
2021-01-28 01:10:36 +01:00
Gael Guennebaud
0668c68b03 Allow for negative strides.
Note that using a stride of -1 is still not possible because it would
clash with the definition of Eigen::Dynamic.

This fixes #747.
2021-01-27 23:32:12 +01:00
Samir Benmendil
288d456c29 Replace language_support module with builtin CheckLanguage
The workaround_9220 function was introduced a long time ago to
workaround a CMake issue with enable_language(OPTIONAL). Since then
CMake has clarified that the OPTIONAL keywords has not been
implemented[0].

A CheckLanguage module is now provided with CMake to check if a language
can be enabled. Use that instead.

[0] https://cmake.org/cmake/help/v3.18/command/enable_language.html
2021-01-27 13:26:40 +00:00
Antonio Sanchez
3f4684f87d Include <cstdint> in one place, remove custom typedefs
Originating from
[this SO issue](https://stackoverflow.com/questions/65901014/how-to-solve-this-all-error-2-in-this-case),
some win32 compilers define `__int32` as a `long`, but MinGW defines
`std::int32_t` as an `int`, leading to a type conflict.

To avoid this, we remove the custom `typedef` definitions for win32.  The
Tensor module requires C++11 anyways, so we are guaranteed to have
included `<cstdint>` already in `Eigen/Core`.

Also re-arranged the headers to only include `<cstdint>` in one place to
avoid this type of error again.
2021-01-26 14:23:05 -08:00
Chip Kerchner
0784d9f87b Fix sqrt, ldexp and frexp compilation errors. 2021-01-25 15:22:19 -06:00
Gmc2
a4edb1079c fix test of ExtractVolumePatchesOp 2021-01-25 03:23:46 +00:00
Antonio Sanchez
4c42d5ee41 Eliminate implicit conversion warning in test/array_cwise.cpp 2021-01-23 11:54:00 -08:00
Antonio Sanchez
e0d13ead90 Replace std::isnan with numext::isnan for c++03 2021-01-23 11:02:35 -08:00
Florian Maurin
c35965b381 Remove unused variable in SparseLU.h 2021-01-22 22:24:11 +00:00
Antonio Sanchez
f0e46ed5d4 Fix pow and other cwise ops for half/bfloat16.
The new `generic_pow` implementation was failing for half/bfloat16 since
their construction from int/float is not `constexpr`. Modified
in `GenericPacketMathFunctions` to remove `constexpr`.

While adding tests for half/bfloat16, found other issues related to
implicit conversions.

Also needed to implement `numext::arg` for non-integer, non-complex,
non-float/double/long double types.  These seem to be  implicitly
converted to `std::complex<T>`, which then fails for half/bfloat16.
2021-01-22 11:10:54 -08:00
Antonio Sanchez
f19bcffee6 Specialize std::complex operators for use on GPU device.
NVCC and older versions of clang do not fully support `std::complex` on device,
leading to either compile errors (Cannot call `__host__` function) or worse,
runtime errors (Illegal instruction).  For most functions, we can
implement specialized `numext` versions. Here we specialize the standard
operators (with the exception of stream operators and member function operators
with a scalar that are already specialized in `<complex>`) so they can be used
in device code as well.

To import these operators into the current scope, use
`EIGEN_USING_STD_COMPLEX_OPERATORS`. By default, these are imported into
the `Eigen`, `Eigen:internal`, and `Eigen::numext` namespaces.

This allow us to remove specializations of the
sum/difference/product/quotient ops, and allow us to treat complex
numbers like most other scalars (e.g. in tests).
2021-01-22 18:19:19 +00:00
David Tellenbach
65e2169c45 Add support for Arm SVE
This patch adds support for Arm's new vector extension SVE (Scalable Vector Extension). In contrast to other vector extensions that are supported by Eigen, SVE types are inherently *sizeless*. For the use in Eigen we fix their size at compile-time (note that this is not necessary in general, SVE is *length agnostic*).

During compilation the flag `-msve-vector-bits=N` has to be set where `N` is a power of two in the range of `128`to `2048`, indicating the length of an SVE vector.

Since SVE is rather young, we decided to disable it by default even if it would be available. A user has to enable it explicitly by defining `EIGEN_ARM64_USE_SVE`.

This patch introduces the packet types `PacketXf` and `PacketXi` for packets of `float` and `int32_t` respectively. The size of these packets depends on the SVE vector length. E.g. if `-msve-vector-bits=512` is set, `PacketXf` will contain `512/32 = 16` elements.

This MR is joint work with Miguel Tairum <miguel.tairum@arm.com>.
2021-01-21 21:11:57 +00:00
Antonio Sanchez
b2126fd6b5 Fix pfrexp/pldexp for half.
The recent addition of vectorized pow (!330) relies on `pfrexp` and
`pldexp`.  This was missing for `Eigen::half` and `Eigen::bfloat16`.
Adding tests for these packet ops also exposed an issue with handling
negative values in `pfrexp`, returning an incorrect exponent.

Added the missing implementations, corrected the exponent in `pfrexp1`,
and added `packetmath` tests.
2021-01-21 19:32:28 +00:00
Antonio Sanchez
25d8498f8b Fix stable_norm_1 test.
Test enters an infinite loop if size is 1x1 when choosing to select
unique indices for adding `inf` and `NaN` to the input. Here we
revert to non-unique indices, and split the `hypotNorm` check into
two cases: one where both `inf` and `NaN` are added, and one where
only `NaN` is added.
2021-01-21 09:44:42 -08:00
David Tellenbach
660c6b857c Remove std::cerr in iterative solver since we don't have iostream.
This fixes #2123
2021-01-21 11:40:05 +01:00
Antonio Sanchez
d5b7981119 Fix signed-unsigned comparison.
Hex literals are interpreted as unsigned, leading to a comparison between
signed max supported function `abcd[0]`  (which was negative) to the unsigned
literal `0x80000006`.  Should not change result since signed is
implicitly converted to unsigned for the comparison, but eliminates the
warning.
2021-01-20 08:34:00 -08:00
Ivan Popivanov
e409795d6b Proper CPUID 2021-01-18 17:10:11 +00:00
Rasmus Munk Larsen
cdd8fdc32e Vectorize pow(x, y). This closes https://gitlab.com/libeigen/eigen/-/issues/2085, which also contains a description of the algorithm.
I ran some testing (comparing to `std::pow(double(x), double(y)))` for `x` in the set of all (positive) floats in the interval `[std::sqrt(std::numeric_limits<float>::min()), std::sqrt(std::numeric_limits<float>::max())]`, and `y` in `{2, sqrt(2), -sqrt(2)}` I get the following error statistics:

```
max_rel_error = 8.34405e-07
rms_rel_error = 2.76654e-07
```

If I widen the range to all normal float I see lower accuracy for arguments where the result is subnormal, e.g. for `y = sqrt(2)`:

```
max_rel_error = 0.666667
rms = 6.8727e-05
count = 1335165689
argmax = 2.56049e-32, 2.10195e-45 != 1.4013e-45
```

which seems reasonable, since these results are subnormals with only couple of significant bits left.
2021-01-18 13:25:16 +00:00
Antonio Sanchez
bde6741641 Improved std::complex sqrt and rsqrt.
Replaces `std::sqrt` with `complex_sqrt` for all platforms (previously
`complex_sqrt` was only used for CUDA and MSVC), and implements
custom `complex_rsqrt`.

Also introduces `numext::rsqrt` to simplify implementation, and modified
`numext::hypot` to adhere to IEEE IEC 6059 for special cases.

The `complex_sqrt` and `complex_rsqrt` implementations were found to be
significantly faster than `std::sqrt<std::complex<T>>` and
`1/numext::sqrt<std::complex<T>>`.

Benchmark file attached.
```
GCC 10, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           9.21 ns         9.21 ns     73225448
BM_StdSqrt<std::complex<float>>        17.1 ns         17.1 ns     40966545
BM_Sqrt<std::complex<double>>          8.53 ns         8.53 ns     81111062
BM_StdSqrt<std::complex<double>>       21.5 ns         21.5 ns     32757248
BM_Rsqrt<std::complex<float>>          10.3 ns         10.3 ns     68047474
BM_DivSqrt<std::complex<float>>        16.3 ns         16.3 ns     42770127
BM_Rsqrt<std::complex<double>>         11.3 ns         11.3 ns     61322028
BM_DivSqrt<std::complex<double>>       16.5 ns         16.5 ns     42200711

Clang 11, Intel Xeon, x86_64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           7.46 ns         7.45 ns     90742042
BM_StdSqrt<std::complex<float>>        16.6 ns         16.6 ns     42369878
BM_Sqrt<std::complex<double>>          8.49 ns         8.49 ns     81629030
BM_StdSqrt<std::complex<double>>       21.8 ns         21.7 ns     31809588
BM_Rsqrt<std::complex<float>>          8.39 ns         8.39 ns     82933666
BM_DivSqrt<std::complex<float>>        14.4 ns         14.4 ns     48638676
BM_Rsqrt<std::complex<double>>         9.83 ns         9.82 ns     70068956
BM_DivSqrt<std::complex<double>>       15.7 ns         15.7 ns     44487798

Clang 9, Pixel 2, aarch64:
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_Sqrt<std::complex<float>>           24.2 ns         24.1 ns     28616031
BM_StdSqrt<std::complex<float>>         104 ns          103 ns      6826926
BM_Sqrt<std::complex<double>>          31.8 ns         31.8 ns     22157591
BM_StdSqrt<std::complex<double>>        128 ns          128 ns      5437375
BM_Rsqrt<std::complex<float>>          31.9 ns         31.8 ns     22384383
BM_DivSqrt<std::complex<float>>        99.2 ns         98.9 ns      7250438
BM_Rsqrt<std::complex<double>>         46.0 ns         45.8 ns     15338689
BM_DivSqrt<std::complex<double>>        119 ns          119 ns      5898944
```
2021-01-17 08:50:57 -08:00
Maozhou, Ge
21a8a2487c fix paddings of TensorVolumePatchOp 2021-01-15 11:51:49 +08:00
Guoqiang QI
38ae5353ab 1)provide a better generic paddsub op implementation
2)make paddsub op support the Packet2cf/Packet4f/Packet2f in NEON
3)make paddsub op support the Packet2cf/Packet4f in SSE
2021-01-13 22:54:03 +00:00
Antonio Sanchez
352f1422d3 Remove inf local variable.
Apparently `inf` is a macro on iOS for `std::numeric_limits<T>::infinity()`,
causing a compile error here. We don't need the local anyways since it's
only used in one spot.
2021-01-12 10:33:15 -08:00
Antonio Sanchez
2044084979 Remove TODO from Transform::computeScaleRotation()
Upon investigation, `JacobiSVD` is significantly faster than `BDCSVD`
for small matrices (twice as fast for 2x2, 20% faster for 3x3,
1% faster for 10x10).  Since the majority of cases will be small,
let's stick with `JacobiSVD`.  See !361.
2021-01-11 11:30:01 -08:00
Antonio Sanchez
3daf92c7a5 Transform::computeScalingRotation flush determinant to +/- 1.
In the previous code, in attempting to correct for a negative
determinant, we end up multiplying and dividing by a number that
is often very near, but not exactly +/-1.  By flushing to +/-1,
we can replace a division with a multiplication, and results
are more numerically consistent.
2021-01-11 10:13:38 -08:00
Antonio Sanchez
587fd6ab70 Only specialize complex sqrt_impl for CUDA if not MSVC.
We already specialize `sqrt_impl` on windows due to MSVC's mishandling
of `inf` (!355).
2021-01-11 09:15:45 -08:00
Deven Desai
2a6addb4f9 Fix for breakage in ROCm support - 210108
The following commit breaks ROCm support for Eigen
f149e0ebc3

All unit tests fail with the following error

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:19:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:166:
/home/rocm-user/eigen/Eigen/src/Core/MathFunctionsImpl.h:105:35: error: __host__ __device__ function 'complex_sqrt' cannot overload __host__ function 'complex_sqrt'
EIGEN_DEVICE_FUNC std::complex<T> complex_sqrt(const std::complex<T>& z) {
                                  ^
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:342:38: note: previous declaration is here
template<typename T> std::complex<T> complex_sqrt(const std::complex<T>& a_x);
                                     ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16625: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2

```

The error message is accurate, and the fix (provided in thsi commit) is trivial.
2021-01-08 18:04:40 +00:00
Antonio Sanchez
f149e0ebc3 Fix MSVC complex sqrt and packetmath test.
MSVC incorrectly handles `inf` cases for `std::sqrt<std::complex<T>>`.
Here we replace it with a custom version (currently used on GPU).

Also fixed the `packetmath` test, which previously skipped several
corner cases since `CHECK_CWISE1` only tests the first `PacketSize`
elements.
2021-01-08 01:17:19 +00:00
Antonio Sanchez
8d9cfba799 Fix rand test for MSVC.
MSVC's uniform random number generator is not quite as uniform as
others, requiring a slightly wider threshold on the histogram test.
After inspecting histograms for several runs, there's no obvious
bias -- just some bins end up having slightly more less elements
(often > 2% but less than 2.5%).
2021-01-07 12:48:40 -08:00
Essex Edwards
e741b43668 Make Transform::computeRotationScaling(0,&S) continuous 2021-01-07 17:45:14 +00:00
David Tellenbach
0bdc0dba20 Add missing #endif directive in Macros.h 2021-01-07 12:32:41 +01:00
shrek1402
cb654b1c45 #define was defined incorrectly because the result_of function was deprecated in c++17 and removed in c++20. Also, EIGEN_COMP_MSVC (which is _MSC_VER) only affects result_of indirectly, which can cause errors. 2021-01-07 10:12:25 +00:00
Antonio Sanchez
52d1dd979a Fix Ref initialization.
Since `eigen_assert` is a macro, the statements can become noops (e.g.
when compiling for GPU), so they may not execute the contained logic -- which
in this case is the entire `Ref` construction.  We need to separate the assert
from statements which have consequences.

Fixes #2113
2021-01-06 13:14:20 -08:00
Antonio Sanchez
166fcdecdb Allow CwiseUnaryView to be used on device.
Added `EIGEN_DEVICE_FUNC` to methods.
2021-01-06 09:16:52 -08:00
Antonio Sanchez
bb1de9dbde Fix Ref Stride checks.
The existing `Ref` class failed to consider cases where the Ref's
`Stride` setting *could* match the underlying referred object's stride,
but **didn't** at runtime.  This led to trying to set invalid stride values,
causing runtime failures in some cases, and garbage due to mismatched
strides in others.

Here we add the missing runtime checks.  This involves computing the
strides necessary to align with the referred object's storage, and
verifying we can actually set those strides at runtime.

In the `const` case, if it *may* be possible to refer to the original
storage at compile-time but fails at runtime, then we defer to the
`construct(...)` method that makes a copy.

Added more tests to check these cases.

Fixes #2093.
2021-01-05 10:41:25 -08:00
Christoph Hertzberg
12dda34b15 Eliminate boolean product warnings by factoring out a
`combine_scalar_factors` helper function.
2021-01-05 18:15:30 +00:00
Antonio Sanchez
070d303d56 Add CUDA complex sqrt.
This is to support scalar `sqrt` of complex numbers `std::complex<T>` on
device, requested by Tensorflow folks.

Technically `std::complex` is not supported by NVCC on device
(though it is by clang), so the default `sqrt(std::complex<T>)` function only
works on the host. Here we create an overload to add back the
functionality.

Also modified the CMake file to add `--relaxed-constexpr` (or
equivalent) flag for NVCC to allow calling constexpr functions from
device functions, and added support for specifying compute architecture for
NVCC (was already available for clang).
2020-12-22 23:25:23 -08:00
rgreenblatt
fdf2ee62c5 Fix missing EIGEN_DEVICE_FUNC 2020-12-20 23:22:53 -05:00
Rasmus Munk Larsen
05754100fe * Add iterative psqrt<double> for AVX and SSE when FMA is available. This provides a ~10% speedup.
* Write iterative sqrt explicitly in terms of pmadd. This gives up to 7% speedup for psqrt<float> with AVX & SSE with FMA.
* Remove iterative psqrt<double> for NEON, because the initial rsqrt apprimation is not accurate enough for convergence in 2 Newton-Raphson steps and with 3 steps, just calling the builtin sqrt insn is faster.

The following benchmarks were compiled with clang "-O2 -fast-math -mfma" and with and without -mavx.

AVX+FMA (float)

name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 1%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.08ns ± 1%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  95.5ns ± 0%    ~
BM_eigen_sqrt_float/4k     776ns ± 0%   763ns ± 0%  -1.67%
BM_eigen_sqrt_float/32k   6.57µs ± 1%  6.13µs ± 0%  -6.69%
BM_eigen_sqrt_float/256k  83.7µs ± 3%  83.3µs ± 2%    ~
BM_eigen_sqrt_float/1M     335µs ± 2%   332µs ± 2%    ~

SSE+FMA (float)
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_float/1     1.08ns ± 0%  1.09ns ± 0%    ~
BM_eigen_sqrt_float/8     2.07ns ± 0%  2.06ns ± 0%    ~
BM_eigen_sqrt_float/64    12.4ns ± 0%  12.4ns ± 1%    ~
BM_eigen_sqrt_float/512   95.7ns ± 0%  96.3ns ± 4%    ~
BM_eigen_sqrt_float/4k     774ns ± 0%   763ns ± 0%  -1.50%
BM_eigen_sqrt_float/32k   6.58µs ± 2%  6.11µs ± 0%  -7.06%
BM_eigen_sqrt_float/256k  82.7µs ± 1%  82.6µs ± 1%    ~
BM_eigen_sqrt_float/1M     330µs ± 1%   329µs ± 2%    ~

SSE+FMA (double)
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.08ns ± 0%   -6.68%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.65%
BM_eigen_sqrt_double/512     417ns ± 0%   374ns ± 1%  -10.29%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -11.00%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.7µs ± 0%  -11.07%
BM_eigen_sqrt_double/256k    213µs ± 0%   206µs ± 1%   -3.31%
BM_eigen_sqrt_double/1M      862µs ± 0%   870µs ± 2%   +0.96%

AVX+FMA (double)
name                        old cpu/op  new cpu/op  delta
BM_eigen_sqrt_double/1      1.63ns ± 0%  1.63ns ± 0%     ~
BM_eigen_sqrt_double/8      6.51ns ± 0%  6.06ns ± 0%   -6.95%
BM_eigen_sqrt_double/64     52.1ns ± 0%  46.5ns ± 1%  -10.80%
BM_eigen_sqrt_double/512     417ns ± 0%   373ns ± 1%  -10.59%
BM_eigen_sqrt_double/4k     3.33µs ± 0%  2.97µs ± 1%  -10.79%
BM_eigen_sqrt_double/32k    26.7µs ± 0%  23.8µs ± 0%  -10.94%
BM_eigen_sqrt_double/256k    214µs ± 0%   208µs ± 2%   -2.76%
BM_eigen_sqrt_double/1M      866µs ± 3%   923µs ± 7%     ~
2020-12-16 18:16:11 +00:00
Turing Eret
3bee9422d6 Merge branch 'lambdaknight/eigen-master' 2020-12-16 09:18:24 -07:00
Turing Eret
19e6496ce0 Replace call to FixedDimensions() with a singleton instance of
FixedDimensions.
2020-12-16 07:34:44 -07:00
Rasmus Munk Larsen
6cee8d347e Add an additional step of Newton-Raphson for psqrt<double> on Arm, which otherwise has an error of ~1000 ulps. 2020-12-15 04:06:41 +00:00
Turing Eret
bc7d1599fb TensorStorage with FixedDimensions now has zero instance memory overhead.
Removed m_dimension as instance member of TensorStorage with
FixedDimensions and instead use the template parameter. This
means that the sizeof a pure fixed-size storage is exactly
equal to the data it is storing.
2020-12-14 07:19:34 -07:00
Alexander Grund
cf0b5b0344 Remove code checking for CMake < 3.5
As the CMake version is at least 3.5 the code checking for earlier versions can be removed.
2020-12-14 09:57:44 +00:00
David Tellenbach
751f18f2c0 Remove comma at the end of enumeration list to silence C++03 warnings 2020-12-13 18:11:02 +01:00
Antonio Sanchez
5dc2fbabee Fix implicit cast to double.
Triggers `-Wimplicit-float-conversion`, causing a bunch of build errors
in Google due to `-Wall`.
2020-12-12 09:26:20 -08:00
Antonio Sanchez
55967f87d1 Fix NEON pmax<PropagateNumbers,Packet4bf>.
Simple typo, the max impl called pmin instead of pmax for floats.
2020-12-11 21:50:52 -08:00
Antonio Sanchez
839aa505c3 Fix typo in AVX512 packet math. 2020-12-11 21:35:44 -08:00
David Tellenbach
536c8a79f2 Remove unused macro in Half.h 2020-12-12 00:53:26 +01:00
Antonio Sanchez
8c9976d7f0 Fix more SSE/AVX packet conversions for peven.
MSVC doesn't like function-style casts and forces us to use intrinsics.
2020-12-11 15:46:42 -08:00
Antonio Sanchez
c6efc4e0ba Replace M_LOG2E and M_LN2 with custom macros.
For these to exist we would need to define `_USE_MATH_DEFINES` before
`cmath` or `math.h` is first included.  However, we don't
control the include order for projects outside Eigen, so even defining
the macro in `Eigen/Core` does not fix the issue for projects that
end up including `<cmath>` before Eigen does (explicitly or transitively).

To fix this, we define `EIGEN_LOG2E` and `EIGEN_LN2` ourselves.
2020-12-11 14:34:31 -08:00
Antonio Sanchez
e82722a4a7 Fix MSVC SSE casts.
MSVC doesn't like __m128(__m128i) c-style casts, so packets need to be
converted using intrinsic methods.
2020-12-11 08:52:59 -08:00
Deven Desai
f3d2ea48f5 Fix for broken ROCm/HIP Support
The following commit introduced a breakage in ROCm/HIP support for Eigen.

5ec4907434 (1958e65719641efe5483abc4ce0b61806270f6f3_525_517)

```
Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:356:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:222:
/home/rocm-user/eigen/Eigen/src/Core/arch/GPU/PacketMath.h:556:10: error: use of undeclared identifier 'half2half2'; did you mean '__half2half2'?
  return half2half2(from);
         ^~~~~~~~~~
         __half2half2
/opt/rocm/hip/include/hip/hcc_detail/hip_fp16.h:547:21: note: '__half2half2' declared here
            __half2 __half2half2(__half x)
                    ^
1 error generated when compiling for gfx900.

```

The cause seems to be a copy-paster error, and the fix is trivial
2020-12-11 16:14:57 +00:00
David Tellenbach
c7eb3a74cb Don't guard psqrt for std::complex<float> with EIGEN_ARCH_ARM64 2020-12-11 12:41:52 +01:00
Everton Constantino
bccf055a7c Add Armv8 guard on PropagateNumbers implementation. 2020-12-10 22:01:55 -03:00
Antonio Sanchez
82c0c18a83 Remove private access of std::deque::_M_impl.
This no longer works on gcc or clang, so we should just remove the hack.
The default should compile to similar code anyways.
2020-12-10 14:59:34 -08:00
David Tellenbach
00be0a7ff3 Fix vectorization of complex sqrt on NEON 2020-12-10 15:23:23 +00:00
David Tellenbach
8eb461a431 Remove comma at end of enumerator list in NEON PacketMath 2020-12-10 15:22:55 +01:00
David Tellenbach
2e8f850c78 Fix a typo in SparseMatrix documentation.
This fixes issue #2091.
2020-12-09 14:48:24 +01:00
Rasmus Munk Larsen
125cc9a5df Implement vectorized complex square root.
Closes #1905

Measured speedup for sqrt of `complex<float>` on Skylake:

SSE:
```
name                      old time/op             new time/op  delta
BM_eigen_sqrt_ctype/1     49.4ns ± 0%             54.3ns ± 0%  +10.01%
BM_eigen_sqrt_ctype/8      332ns ± 0%               50ns ± 1%  -84.97%
BM_eigen_sqrt_ctype/64    2.81µs ± 1%             0.38µs ± 0%  -86.49%
BM_eigen_sqrt_ctype/512   23.8µs ± 0%              3.0µs ± 0%  -87.32%
BM_eigen_sqrt_ctype/4k     202µs ± 0%               24µs ± 2%  -88.03%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%             0.19ms ± 0%  -88.18%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%              1.5ms ± 1%  -88.20%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%              6.2ms ± 0%  -88.18%
```

AVX2:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.6ns ± 0%  55.6ns ± 0%   +3.71%
BM_eigen_sqrt_ctype/8      334ns ± 0%    27ns ± 0%  -91.86%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.22µs ± 2%  -92.28%
BM_eigen_sqrt_ctype/512   23.8µs ± 1%   1.7µs ± 1%  -92.81%
BM_eigen_sqrt_ctype/4k     201µs ± 0%    14µs ± 1%  -93.24%
BM_eigen_sqrt_ctype/32k   1.62ms ± 0%  0.11ms ± 1%  -93.29%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.9ms ± 1%  -93.31%
BM_eigen_sqrt_ctype/1M    52.0ms ± 0%   3.5ms ± 1%  -93.31%
```

AVX512:
```
name                      old cpu/op  new cpu/op  delta
BM_eigen_sqrt_ctype/1     53.7ns ± 0%  56.2ns ± 1%   +4.75%
BM_eigen_sqrt_ctype/8      334ns ± 0%    18ns ± 2%  -94.63%
BM_eigen_sqrt_ctype/64    2.79µs ± 0%  0.12µs ± 1%  -95.54%
BM_eigen_sqrt_ctype/512   23.9µs ± 1%   1.0µs ± 1%  -95.89%
BM_eigen_sqrt_ctype/4k     202µs ± 0%     8µs ± 1%  -96.13%
BM_eigen_sqrt_ctype/32k   1.63ms ± 0%  0.06ms ± 1%  -96.15%
BM_eigen_sqrt_ctype/256k  13.0ms ± 0%   0.5ms ± 4%  -96.11%
BM_eigen_sqrt_ctype/1M    52.1ms ± 0%   2.0ms ± 1%  -96.13%
```
2020-12-08 18:13:35 -08:00
Antonio Sanchez
8cfe0db108 Fix host/device calls for __half.
The previous code had `__host__ __device__` functions calling `__device__`
functions (e.g. `__low2half`) which caused build failures in tensorflow.
Also tried to simplify the `#ifdef` guards to make them more clear.
2020-12-08 20:31:02 +00:00
Everton Constantino
baf9d762b7 - Enabling PropagateNaN and PropagateNumbers for NEON.
- Adding propagate tests to bfloat16.
2020-12-08 17:05:05 +00:00
Antonio Sanchez
634bd79b0e Fix unused warning on new dense_assignment_loop impl. 2020-12-07 19:14:21 -08:00
Antonio Sanchez
655c3a4042 Add specialization for compile-time zero-sized dense assignment.
In the current `dense_assignment_loop` implementations, if the
destination's inner or outer size is zero at compile time and if the kernel
involves a product, we currently get a compile error (#2080).  This is
triggered by attempting to multiply a non-existent row by a column (or
vice-versa).

To address this, we add a specialization for zero-sized assignments
(`AllAtOnceTraversal`) which evaluates to a no-op. We also add a static
check to ensure the size is in-fact zero. This now seems to be the only
existing use of `AllAtOnceTraversal`.

Fixes #2080.
2020-12-07 08:38:43 -08:00
Antonio Sanchez
5ec4907434 Clean up #ifs in GPU PacketPath.
Removed redundant checks and redundant code for CUDA/HIP.

Note: there are several issues here of calling `__device__` functions
from `__host__ __device__` functions, in particular `__low2half`.
We do not address that here -- only modifying this file enough
to get our current tests to compile.

Fixed: #1847
2020-12-04 16:14:03 -08:00
Rasmus Munk Larsen
f9fac1d5b0 Add log2() to Eigen. 2020-12-04 21:45:09 +00:00
Antonio Sanchez
2dbac2f99f Fix bad NEON fp16 check 2020-12-04 13:42:18 -08:00
Antonio Sanchez
e2f21465fe Special function implementations for half/bfloat16 packets.
Current implementations fail to consider half-float packets, only
half-float scalars.  Added specializations for packets on AVX, AVX512 and
NEON.  Added tests to `special_packetmath`.

The current `special_functions` tests would fail for half and bfloat16 due to
lack of precision. The NEON tests also fail with precision issues and
due to different handling of `sqrt(inf)`, so special functions bessel, ndtri
have been disabled.

Tested with AVX, AVX512.
2020-12-04 10:16:29 -08:00
David Tellenbach
305b8bd277 Remove duplicate #if clause 2020-12-04 18:55:46 +01:00
Antonio Sanchez
9ee9ac81de Fix shfl* macros for CUDA/HIP
The `shfl*` functions are `__device__` only, and adjusted `#ifdef`s so
they are defined whenever the corresponding CUDA/HIP ones are.

Also changed the HIP/CUDA<9.0 versions to cast to int instead of
doing the conversion `half`<->`float`.

Fixes #2083
2020-12-04 17:18:32 +00:00
shrek1402
a9a2f2bebf The function 'prefetch' did not work correctly on the win64 platform 2020-12-04 17:18:08 +00:00
Rasmus Munk Larsen
f23dc5b971 Revert "Add log2() operator to Eigen"
This reverts commit 4d91519a9b.
2020-12-03 14:32:45 -08:00
Rasmus Munk Larsen
4d91519a9b Add log2() operator to Eigen 2020-12-03 22:31:44 +00:00
Rasmus Munk Larsen
25d8ae7465 Small cleanup of generic plog implementations:
Adding the term e*ln(2) is split into two step for no obvious reason.
This dates back to the original Cephes code from which the algorithm is adapted.
It appears that this was done in Cephes to prevent the compiler from reordering
the addition of the 3 terms in the approximation

  log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| < (sqrt(2)-1).

This allows rewriting the code to just 2 pmadd and 1 padd instructions,
which on a Skylake processor speeds up the code by 5-7%.
2020-12-03 19:40:40 +00:00
Antonio Sanchez
eb4d4ae070 Include chrono in main for c++11.
Hack to fix tensor tests, since min/max are overridden by `main.h`.
2020-12-03 11:27:32 -08:00
Rasmus Munk Larsen
71c85df4c1 Clean up the Tensor header and get rid of the EIGEN_SLEEP macro. 2020-12-02 11:04:04 -08:00
Antonio Sanchez
70fbcf82ed Fix typo in F32MaskToBf16Mask. 2020-12-02 07:58:34 -08:00
Antonio Sanchez
2627e2f2e6 Fix neon cmp* functions for bf16.
The current impl corrupts the comparison masks when converting
from float back to bfloat16.  The resulting masks are then
no longer all zeros or all ones, which breaks when used with
`pselect` (e.g. in `pmin<PropagateNumbers>`).  This was
causing `packetmath_15` to fail on arm.

Introducing a simple `F32MaskToBf16Mask` corrects this (takes
the lower 16-bits for each float mask).
2020-12-02 01:29:34 +00:00
Antonio Sanchez
ddd48b242c Implement CUDA __shfl* for Eigen::half
Prior to this fix, `TensorContractionGpu` and the `cxx11_tensor_of_float16_gpu`
test are broken, as well as several ops in Tensorflow. The gpu functions
`__shfl*` became ambiguous now that `Eigen::half` implicitly converts to float.
Here we add the required specializations.
2020-12-01 14:36:52 -08:00
Rasmus Munk Larsen
e57281a741 Fix a few issues for AVX512. This change enables vectorized versions of log, exp, log1p, expm1 when AVX512DQ is not available. 2020-12-01 11:31:47 -08:00
Antonio Sanchez
1992af3de2 Fix #2077, EIGEN_CONSTEXPR in Half.
`bit_cast` cannot be `constexpr`, so we need to remove `EIGEN_CONSTEXPR` from
`raw_half_as_uint16(...)`.  This shouldn't affect anything else, since
it is only used in `a bit_cast<uint16_t,half>()` which is not itself
`constexpr`.

Fixes #2077.
2020-12-01 03:10:21 +00:00
acxz
7b80609d49 add EIGEN_DEVICE_FUNC to methods 2020-12-01 03:08:47 +00:00
Antonio Sanchez
89f90b585d AVX512 missing ops.
This allows the `packetmath` tests to pass for AVX512 on skylake.
Made `half` and `bfloat16` consistent in terms of ops they support.

Note the `log` tests are currently disabled for `bfloat16` since
they fail due to poor precision (they were previously disabled for
`Packet8bf` via test function specialization -- I just removed that
specialization and disabled it in the generic test).
2020-11-30 16:28:57 +00:00
Florian Maurin
c5985c46f5 Fix typo in doc 2020-11-30 10:53:29 +00:00
Jim Lersch
68f69414f7 Workaround for doxygen class template titles in which the template
part of the class signature is lost due to a problem with forward
declarations.  The problem is probably caused by doxygen bug #7689.
It is confirmed to be fixed in doxygen >= 1.8.19.
2020-11-27 19:52:16 -07:00
Jim Lersch
a7170f2aca Fix doxygen class blocks that were not associated with the correct classes. 2020-11-27 08:48:11 -07:00
David Tellenbach
550e8f8f57 Include CMakeDependentOption to be able to use cmake_dependent_option 2020-11-27 13:21:49 +01:00
Bowie Owens
9842366bba Make inclusion of doc sub-directory optional by adjusting options.
Allows exclusion of doc and related targets to help when using eigen via add_subdirectory().

Requested by:

https://gitlab.com/libeigen/eigen/-/issues/1842

Also required making EIGEN_TEST_BUILD_DOCUMENTATION a dependent option on EIGEN_BUILD_DOC. This ensures documentation targets are properly defined when EIGEN_TEST_BUILD_DOCUMENTATION is ON.
2020-11-27 08:11:49 +11:00
filippobrizzi
aa56e1d980 check for include dirs set 2020-11-26 10:22:46 +00:00
Andreas Krebbel
1e74f93d55 Fix some packet-functions in the IBM ZVector packet-math. 2020-11-25 14:11:23 +00:00
Rasmus Munk Larsen
79818216ed Revert "Fix Half NaN definition and test."
This reverts commit c770746d70.
2020-11-24 12:57:28 -08:00
Rasmus Munk Larsen
c770746d70 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-24 20:53:07 +00:00
Antonio Sanchez
22f67b5958 Fix boolean float conversion and product warnings.
This fixes some gcc warnings such as:
```
Eigen/src/Core/GenericPacketMath.h:655:63: warning: implicit conversion turns floating-point number into bool: 'typename __gnu_cxx::__enable_if<__is_integer<bool>::__value, double>::__type' (aka 'double') to 'bool' [-Wimplicit-conversion-floating-point-to-bool]
    Packet psqrt(const Packet& a) { EIGEN_USING_STD(sqrt); return sqrt(a); }
```

Details:

- Added `scalar_sqrt_op<bool>` (`-Wimplicit-conversion-floating-point-to-bool`).

- Added `scalar_square_op<bool>` and `scalar_cube_op<bool>`
specializations (`-Wint-in-bool-context`)

- Deprecated above specialized ops for bool.

- Modified `cxx11_tensor_block_eval` to specialize generator for
booleans (`-Wint-in-bool-context`) and to use `abs` instead of `square` to
avoid deprecated bool ops.
2020-11-24 20:20:36 +00:00
Antonio Sanchez
a3b300f1af Implement missing AVX half ops.
Minimal implementation of AVX `Eigen::half` ops to bring in line
with `bfloat16`.  Allows `packetmath_13` to pass.

Also adjusted `bfloat16` packet traits to match the supported set
of ops (e.g. Bessel is not actually implemented).
2020-11-24 16:46:41 +00:00
Antonio Sanchez
38abf2be42 Fix Half NaN definition and test.
The `half_float` test was failing with `-mcpu=cortex-a55` (native `__fp16`) due
to a bad NaN bit-pattern comparison (in the case of casting a float to `__fp16`,
the signaling `NaN` is quieted). There was also an inconsistency between
`numeric_limits<half>::quiet_NaN()` and `NumTraits::quiet_NaN()`.  Here we
correct the inconsistency and compare NaNs according to the IEEE 754
definition.

Also modified the `bfloat16_float` test to match.

Tested with `cortex-a53` and `cortex-a55`.
2020-11-23 14:13:59 -08:00
Antonio Sanchez
4cf01d2cf5 Update AVX half packets, disable test.
The AVX half implementation is incomplete, causing the `packetmath_13` test
to fail.  This disables the test.

Also refactored the existing AVX implementation to use `bit_cast`
instead of direct access to `.x`.
2020-11-21 09:05:10 -08:00
Antonio Sanchez
fd1dcb6b45 Fixes duplicate symbol when building blas
Missing inline breaks blas, since symbol generated in
`complex_single.cpp`, `complex_double.cpp`, `single.cpp`, `double.cpp`

Changed rest of inlines to `EIGEN_STRONG_INLINE`.
2020-11-20 09:37:40 -08:00
David Tellenbach
6c9c3f9a1a Remove explicit casts from Eigen::half and Eigen::bfloat16 to bool
Both, Eigen::half and Eigen::Bfloat16 are implicitly convertible to
float and can hence be converted to bool via the conversion chain

  Eigen::{half,bfloat16} -> float -> bool

We thus remove the explicit cast operator to bool.
2020-11-19 18:49:09 +01:00
Antonio Sanchez
a8fdcae55d Fix sparse_extra_3, disable counting temporaries for testing DynamicSparseMatrix.
Multiplication of column-major `DynamicSparseMatrix`es involves three
temporaries:
- two for transposing twice to sort the coefficients
(`ConservativeSparseSparseProduct.h`, L160-161)
- one for a final copy assignment (`SparseAssign.h`, L108)
The latter is avoided in an optimization for `SparseMatrix`.

Since `DynamicSparseMatrix` is deprecated in favor of `SparseMatrix`, it's not
worth the effort to optimize further, so I simply disabled counting
temporaries via a macro.

Note that due to the inclusion of `sparse_product.cpp`, the `sparse_extra`
tests actually re-run all the original `sparse_product` tests as well.

We may want to simply drop the `DynamicSparseMatrix` tests altogether, which
would eliminate the test duplication.

Related to #2048
2020-11-18 23:15:33 +00:00
David Tellenbach
11e4056f6b Re-enable Arm Neon Eigen::half packets of size 8
- Add predux_half_dowto4
- Remove explicit casts in Half.h to match the behaviour of BFloat16.h
- Enable more packetmath tests for Eigen::half
2020-11-18 23:02:21 +00:00
Antonio Sanchez
17268b155d Add bit_cast for half/bfloat to/from uint16_t, fix TensorRandom
The existing `TensorRandom.h` implementation makes the assumption that
`half` (`bfloat16`) has a `uint16_t` member `x` (`value`), which is not
always true. This currently fails on arm64, where `x` has type `__fp16`.
Added `bit_cast` specializations to allow casting to/from `uint16_t`
for both `half` and `bfloat16`.  Also added tests in
`half_float`, `bfloat16_float`, and `cxx11_tensor_random` to catch
these errors in the future.
2020-11-18 20:32:35 +00:00
Antonio Sanchez
41d5d5334b Initialize primitives to fix -Wuninitialized-const-reference.
The `meta` test generates warnings with the latest version of clang due
to passing uninitialized variables as const reference arguments.
```
test/meta.cpp:102:45: error: variable 'f' is uninitialized when passed as a const reference argument here [-Werror,-Wuninitialized-const-reference]
    VERIFY(( check_is_convertible(a.dot(b), f) ));
```
We don't actually use the variables, but initializing them eliminates the
new warning.

Fixes #2067.
2020-11-18 20:23:20 +00:00
Antonio Sanchez
3669498f5a Fix rule-of-3 for the Tensor module.
Adds copy constructors to Tensor ops, inherits assignment operators from
`TensorBase`.

Addresses #1863
2020-11-18 18:14:53 +00:00
Antonio Sanchez
60218829b7 EOF newline added to InverseSize4.
Causing build breakages due to `-Wnewline-eof -Werror` that seems to be
common across Google.
2020-11-18 07:58:33 -08:00
Rasmus Munk Larsen
2d63706545 Add missing parens around macro argument. 2020-11-18 00:24:19 +00:00
Rasmus Munk Larsen
6bba58f109 Replace SSE_SHUFFLE_MASK macro with shuffle_mask. 2020-11-17 15:28:37 -08:00
David Tellenbach
e9b55c4db8 Avoid promotion of Arm __fp16 to float in Neon PacketMath
Using overloaded arithmetic operators for Arm __fp16 always
causes a promotion to float. We replace operator* by vmulh_f16
to avoid this.
2020-11-17 20:19:44 +01:00
Antonio Sanchez
117a4c0617 Fix missing EIGEN_CONSTEXPR pop_macro in Half.
`EIGEN_CONSTEXPR` is getting pushed but not popped in `Half.h` if
`EIGEN_HAS_ARM64_FP16_SCALAR_ARITHMETIC` is defined.
2020-11-17 08:29:33 -08:00
Guoqiang QI
394f564055 Unify Inverse_SSE.h and Inverse_NEON.h into a single generic implementation using PacketMath. 2020-11-17 12:27:01 +00:00
Antonio Sanchez
8e9cc5b10a Eliminate double-promotion warnings.
Clang currently complains about implicit conversions, e.g.
```
test/packetmath.cpp:680:59: warning: implicit conversion increases floating-point precision: 'typename Eigen::internal::random_retval<typename Eigen::internal::global_math_functions_filtering_base<double>::type>::type' (aka 'double') to 'long double' [-Wdouble-promotion]
          data1[0] = Scalar((2 * k + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
                                                        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test/packetmath.cpp:681:40: warning: implicit conversion increases floating-point precision: 'float' to 'long double' [-Wdouble-promotion]
          data1[1] = Scalar((2 * k + 2 + k1) * EIGEN_PI / 2 * internal::random<double>(0.8, 1.2));
```

Modified to explicitly cast to double.
2020-11-16 10:39:09 -08:00
acxz
9175f50d6f Add EIGEN_DEVICE_FUNC to TranspositionsBase
Fixes #2057.
2020-11-16 15:37:40 +00:00
Martin Vonheim Larsen
280f4f2407 Enable MathJax in Doxygen.in
Note that HTTPS must be used against the MathJax CDN when hosted on `eigen.tuxfamily.org` (which uses HTTPS) in order to avoid `Mixed Content`-errors from browsers. Using HTTPS for MathJax also works if the Eigen docs are hosted on plain HTTP.
2020-11-16 12:59:13 +00:00
Antonio Sanchez
bb69a8db5d Explicit casts of S -> std::complex<T>
When calling `internal::cast<S, std::complex<T>>(x)`, clang often
generates an implicit conversion warning due to an implicit cast
from type `S` to `T`.  This currently affects the following tests:
- `basicstuff`
- `bfloat16_float`
- `cxx11_tensor_casts`

The implicit cast leads to widening/narrowing float conversions.
Widening warnings only seem to be generated by clang (`-Wdouble-promotion`).

To eliminate the warning, we explicitly cast the real-component first
from `S` to `T`.  We also adjust tests to use `internal::cast` instead
of `static_cast` when a complex type may be involved.
2020-11-14 05:50:42 +00:00
Christoph Hertzberg
90f6d9d23e Suppress ignored-attributes warning (same as in vectorization_logic). Remove redundant include and using namespace. 2020-11-13 16:21:53 +01:00
guoqiangqi
8324e5e049 Fix typo in NEON/PacketMath.h 2020-11-13 00:46:41 +00:00
Antonio Sanchez
852513e7a6 Disable testing of OpenGL by default.
The `OpenGLSupport` module contains mostly deprecated features, and the
test is highly GL context-dependent, relies on deprecated GLUT, and
requires a display.  Until the module is updated to support modern
OpenGL and the test to use newer windowing frameworks (e.g. GLFW)
it's probably best to disable the test by default.

The test can be enabled with `cmake -DEIGEN_TEST_OPENGL=ON`.

See #2053 for more details.
2020-11-12 16:15:40 -08:00
Rasmus Munk Larsen
bec72345d6 Simplify expression for inner product fallback in Gemv product evaluator. 2020-11-12 23:43:15 +00:00
Rasmus Munk Larsen
276db21f26 Remove redundant branch for handling dynamic vector*vector. This will be handled by the equivalent branch in the specialization for GemvProduct. 2020-11-12 21:54:56 +00:00
Rasmus Munk Larsen
cf12474a8b Optimize matrix*matrix and matrix*vector products when they correspond to inner products at runtime.
This speeds up inner products where the one or or both arguments is dynamic for small and medium-sized vectors (up to 32k).

name                           old time/op             new time/op   delta
BM_VecVecStatStat<float>/1     1.64ns ± 0%             1.64ns ± 0%     ~
BM_VecVecStatStat<float>/8     2.99ns ± 0%             2.99ns ± 0%     ~
BM_VecVecStatStat<float>/64    7.00ns ± 1%             7.04ns ± 0%   +0.66%
BM_VecVecStatStat<float>/512   61.6ns ± 0%             61.6ns ± 0%     ~
BM_VecVecStatStat<float>/4k     551ns ± 0%              553ns ± 1%   +0.26%
BM_VecVecStatStat<float>/32k   4.45µs ± 0%             4.45µs ± 0%     ~
BM_VecVecStatStat<float>/256k  77.9µs ± 0%             78.1µs ± 1%     ~
BM_VecVecStatStat<float>/1M     312µs ± 0%              312µs ± 1%     ~
BM_VecVecDynStat<float>/1      13.3ns ± 1%              4.6ns ± 0%  -65.35%
BM_VecVecDynStat<float>/8      14.4ns ± 0%              6.2ns ± 0%  -57.00%
BM_VecVecDynStat<float>/64     24.0ns ± 0%             10.2ns ± 3%  -57.57%
BM_VecVecDynStat<float>/512     138ns ± 0%               68ns ± 0%  -50.52%
BM_VecVecDynStat<float>/4k     1.11µs ± 0%             0.56µs ± 0%  -49.72%
BM_VecVecDynStat<float>/32k    8.89µs ± 0%             4.46µs ± 0%  -49.89%
BM_VecVecDynStat<float>/256k   78.2µs ± 0%             78.1µs ± 1%     ~
BM_VecVecDynStat<float>/1M      313µs ± 0%              312µs ± 1%     ~
BM_VecVecDynDyn<float>/1       10.4ns ± 0%             10.5ns ± 0%   +0.91%
BM_VecVecDynDyn<float>/8       12.0ns ± 3%             11.9ns ± 0%     ~
BM_VecVecDynDyn<float>/64      37.4ns ± 0%             19.6ns ± 1%  -47.57%
BM_VecVecDynDyn<float>/512      159ns ± 0%               81ns ± 0%  -49.07%
BM_VecVecDynDyn<float>/4k      1.13µs ± 0%             0.58µs ± 1%  -49.11%
BM_VecVecDynDyn<float>/32k     8.91µs ± 0%             5.06µs ±12%  -43.23%
BM_VecVecDynDyn<float>/256k    78.2µs ± 0%             78.2µs ± 1%     ~
BM_VecVecDynDyn<float>/1M       313µs ± 0%              312µs ± 1%     ~
2020-11-12 18:02:37 +00:00
Pedro Caldeira
c29935b323 Add support for dynamic dispatch of MMA instructions for POWER 10 2020-11-12 11:31:15 -03:00
acxz
b714dd9701 remove annotation for first declaration of default con/destruction 2020-11-12 04:34:12 +00:00
mehdi-goli
e24a1f57e3 [SYCL Function pointer Issue]: SYCL does not support function pointer inside the kernel, due to the portability issue of a function pointer and memory address space among host and accelerators. To fix the issue, function pointers have been replaced by function objects. 2020-11-12 01:50:28 +00:00
Antonio Sanchez
6961468915 Address issues with openglsupport test.
The existing test fails on several systems due to GL runtime version mismatches,
the use of deprecated features, and memory errors due to improper use of GLUT.
The test was modified to:

- Run within a display function, allowing proper GLUT cleanup.
- Generate dynamic shaders with a supported GLSL version string and output variables.
- Report shader compilation errors.
- Check GL context version before launching version-specific tests.

Note that most of the existing `OpenGLSupport` module and tests rely on deprecated
features (e.g. fixed-function pipeline). The test was modified to allow it to
pass on various systems. We might want to consider removing the module or re-writing
it entirely to support modern OpenGL.  This is beyond the scope of this patch.

Testing of legacy GL (for platforms that support it) can be enabled by defining
`EIGEN_LEGACY_OPENGL`.  Otherwise, the test will try to create a modern context.

Tested on
- MacBook Air (2019), macOS Catalina 10.15.7 (OpenGL 2.1, 4.1)
- Debian 10.6, NVidia Quadro K1200 (OpenGL 3.1, 3.3)
2020-11-11 15:54:43 -08:00
Everton Constantino
348a48682e Fix erroneous forward declaration of boost nvp. 2020-11-10 13:07:34 -03:00
guoqiangqi
82fe059f35 Fix issue2045 which get a error case _mm256_set_m128d op not supported by gcc 7.x 2020-11-04 09:21:39 +08:00
Deven Desai
9d11e2c03e CMakefile update for ROCm 4.0
Starting with ROCm 4.0, the `hipconfig --platform` command will return `amd` (prior return value was `hcc`). Updating the CMakeLists.txt files in the test dirs to account for this change.
2020-10-29 18:06:31 +00:00
Deven Desai
39a038f2e4 Fix for ROCm (and CUDA?) breakage - 201029
The following commit breaks Eigen for ROCm (and probably CUDA too) with the following error

e265f7ed8e

```

Building HIPCC object test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o
In file included from /home/rocm-user/eigen/test/gpu_basic.cu:20:
In file included from /home/rocm-user/eigen/test/main.h:355:
In file included from /home/rocm-user/eigen/Eigen/QR:11:
In file included from /home/rocm-user/eigen/Eigen/Core:169:
/home/rocm-user/eigen/Eigen/src/Core/arch/Default/Half.h:825:76: error: use of undeclared identifier 'numext'; did you mean 'Eigen::numext'?
  return Eigen::half_impl::raw_uint16_to_half(__ldg(reinterpret_cast<const numext::uint16_t*>(ptr)));
                                                                           ^~~~~~
                                                                           Eigen::numext
/home/rocm-user/eigen/Eigen/src/Core/MathFunctions.h:968:11: note: 'Eigen::numext' declared here
namespace numext {
          ^
1 error generated when compiling for gfx900.
CMake Error at gpu_basic_generated_gpu_basic.cu.o.cmake:192 (message):
  Error generating file
  /home/rocm-user/eigen/build/test/CMakeFiles/gpu_basic.dir//./gpu_basic_generated_gpu_basic.cu.o

test/CMakeFiles/gpu_basic.dir/build.make:63: recipe for target 'test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o' failed
make[3]: *** [test/CMakeFiles/gpu_basic.dir/gpu_basic_generated_gpu_basic.cu.o] Error 1
CMakeFiles/Makefile2:16611: recipe for target 'test/CMakeFiles/gpu_basic.dir/all' failed
make[2]: *** [test/CMakeFiles/gpu_basic.dir/all] Error 2
CMakeFiles/Makefile2:16618: recipe for target 'test/CMakeFiles/gpu_basic.dir/rule' failed
make[1]: *** [test/CMakeFiles/gpu_basic.dir/rule] Error 2
Makefile:5401: recipe for target 'gpu_basic' failed
make: *** [gpu_basic] Error 2
```

The fix is in this commit is trivial. Please review and merge
2020-10-29 15:34:05 +00:00
David Tellenbach
f895755c0e Remove unused functions in Half.h.
The following functions have been removed:

  Eigen::half fabsh(const Eigen::half&)
  Eigen::half exph(const Eigen::half&)
  Eigen::half sqrth(const Eigen::half&)
  Eigen::half powh(const Eigen::half&, const Eigen::half&)
  Eigen::half floorh(const Eigen::half&)
  Eigen::half ceilh(const Eigen::half&)
2020-10-29 07:37:52 +01:00
David Tellenbach
09f015852b Replace numext::as_uint with numext::bit_cast<numext::uint32_t> 2020-10-29 07:28:28 +01:00
David Tellenbach
e265f7ed8e Add support for Armv8.2-a __fp16
Armv8.2-a provides a native half-precision floating point (__fp16 aka.
float16_t). This patch introduces

* __fp16 as underlying type of Eigen::half if this type is available
* the packet types Packet4hf and Packet8hf representing float16x4_t and
  float16x8_t respectively
* packet-math for the above packets with corresponding scalar type Eigen::half

The packet-math functionality has been implemented by Ashutosh Sharma
<ashutosh.sharma@amperecomputing.com>.

This closes #1940.
2020-10-28 20:15:09 +00:00
mehdi-goli
a725a3233c [SYCL clean up the code] : removing exrta #pragma unroll in SYCL which was causing issues in embeded systems 2020-10-28 08:34:49 +00:00
mehdi-goli
b9ff791fed [Missing SYCL math op]: Addin the missing LDEXP Function for SYCL. 2020-10-28 08:32:57 +00:00
mehdi-goli
61461d682a [Fixing expf issue]: Eigen uses the packet type operation for scaler type float on Sigmoid function(https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/functors/UnaryFunctors.h#L990). As a result SYCL backend breaks since SYCL backend only supports packet operation for vectorized type float4 and double2. The issue has been fixed by adding scalar type float to packet operation pexp for SYCL backend. 2020-10-28 08:30:34 +00:00
Christoph Hertzberg
ecb7bc9514 Bug #2036 make sure find_standard_math_library_test_program actually compiles (and is guaranteed to call math functions) 2020-10-24 15:22:21 +02:00
Susi Lehtola
09f595a269 Make sure compiler does not optimize away calls to math functions 2020-10-24 06:16:50 +00:00
guoqiangqi
28aef8e816 Improve polynomial evaluation with instruction-level parallelism for pexp_float and pexp<Packet16f> 2020-10-20 11:37:09 +08:00
guoqiangqi
4a77eda1fd remove unnecessary specialize template of pexp for scale float/double 2020-10-19 00:51:42 +00:00
Antonio Sanchez
d9f0d9eb76 Fix missing pfirst<Packet16b> for MSVC.
It was only defined under one `#ifdef` case.  This fixes the `packetmath_14`
test for MSVC.
2020-10-16 16:22:00 -07:00
Rasmus Munk Larsen
21edea5edd Fix the specialization of pfrexp for AVX to be faster when AVX2/AVX512DQ is not available, and avoid undefined behavior in C++. Also mask off the sign bit when extracting the exponent. 2020-10-15 18:39:58 -07:00
Deven Desai
011e0db31d Fix for ROCm/HIP breakage - 201013
The following commit seems to have introduced regressions in ROCm/HIP support.

183a208212

It causes some unit-tests to fail with the following error

```
...
Eigen/src/Core/GenericPacketMath.h:322:3: error: no member named 'bit_and' in the global namespace; did you mean 'std::bit_and'?
...
Eigen/src/Core/GenericPacketMath.h:329:3: error: no member named 'bit_or' in the global namespace; did you mean 'std::bit_or'?
...
Eigen/src/Core/GenericPacketMath.h:336:3: error: no member named 'bit_xor' in the global namespace; did you mean 'std::bit_xor'?
...
```

The error occurs because, when compiling the device code in HIP/CUDA, the compiler will pick up the some of the std functions (whose calls are prefixed by EIGEN_USING_STD) from the global namespace (i.e. use ::bit_xor instead of std::bit_xor). For this to work, those functions must be declared in the global namespace in the HIP/CUDA header files. The `bit_and`, `bit_or` and `bit_xor` routines are not declared in the HIP header file that contain the decls for the std math functions ( `math_functions.h` ), and this is the cause of the error above.

It seems that the newer HIP compilers do support the calling of `std::` math routines within device code, and the ideal fix here would have been to change all calls to std math functions in EIGEN to use the `std::` namespace (instead of the global namespace ), when compiling  with HIP compiler. However it seems there was a recent commit to remove the EIGEN_USING_STD_MATH macro and collapse it uses into the EIGEN_USING_STD macro ( 4091f6b25c ).

Replacing all std math calls will essentially require re-surrecting the EIGEN_USING_STD_MATH macro, so not choosing that option.

Also HIP compilers only have support std math calls within device code, and not all std functions (specifically not for malloc/free which are prefixed via EIGEN_USING_STD). So modyfing EIGEN_USE_STD implementation to use std:: namspace for HIP will not work either.

Hence going for the ugly solution of special casing the three calls that breaking the HIP compile, to explicitly use the std:: namespace
2020-10-15 12:17:35 +00:00
Rasmus Munk Larsen
6ea8091705 Revert change from 4e4d3f32d1 that broke BFloat16.h build with older compilers. 2020-10-15 01:20:08 +00:00
Guoqiang QI
4700713faf Add AVX plog<Packet4d> and AVX512 plog<Packet8d> ops,also unified AVX512 plog<Packet16f> op with generic api 2020-10-15 00:54:45 +00:00
Rasmus Munk Larsen
af6f43d7ff Add specializations for pmin/pmax with prescribed NaN propagation semantics for SSE/AVX/AVX512. 2020-10-14 23:11:24 +00:00
Rasmus Munk Larsen
274ef12b61 Remove leftover debug print statement in cxx11_tensor_expr.cpp 2020-10-14 22:59:51 +00:00
Rasmus Munk Larsen
208b3626d1 Revert generic implementation of predux, since it break compilation of predux_any with MSVC. 2020-10-14 21:41:28 +00:00
David Tellenbach
e3e2cf9d24 Add MatrixBase::cwiseArg() 2020-10-14 01:56:42 +00:00
Rasmus Munk Larsen
61fc78bbda Get rid of nested template specialization in TensorReductionGpu.h, which was broken by c6953f799b. 2020-10-13 23:53:11 +00:00
Rasmus Munk Larsen
c6953f799b Add packet generic ops predux_fmin, predux_fmin_nan, predux_fmax, and predux_fmax_nan that implement reductions with PropagateNaN, and PropagateNumbers semantics. Add (slow) generic implementations for most reductions. 2020-10-13 21:48:31 +00:00
acxz
807e51528d undefine EIGEN_CONSTEXPR before redefinition 2020-10-12 20:28:56 -04:00
Rasmus Munk Larsen
9a4d04c05f Make bitwise_helper a device function to unbreak GPU builds. 2020-10-10 01:45:20 +00:00
Rasmus Munk Larsen
4e4d3f32d1 Clean up packetmath tests and fix various bugs to make bfloat16 pass (almost) all packetmath tests with SSE, AVX, and AVX512. 2020-10-09 20:05:49 +00:00
David Tellenbach
7a8d3d5b81 Disable test exceptions when using OpenMP. 2020-10-09 17:49:07 +02:00
David Tellenbach
9022f5aa8a Mention problems when using potentially throwing scalars and OpenMP 2020-10-09 17:04:25 +02:00
Karl Ljungkvist
d199c17b14 Fix typo in Tutorial_BlockOperations_block_assignment.cpp 2020-10-09 07:51:36 +00:00
David Tellenbach
4091f6b25c Drop EIGEN_USING_STD_MATH in favour of EIGEN_USING_STD 2020-10-09 02:05:05 +02:00
Rasmus Munk Larsen
183a208212 Implement generic bitwise logical packet ops that work for all types. 2020-10-08 22:45:20 +00:00
David Tellenbach
8f8d77b516 Add EIGEN prefix for HAS_LGAMMA_R 2020-10-08 18:32:19 +02:00
Eugene Zhulenev
2279f2c62f Use lgamma_r if it is available (update check for glibc 2.19+) 2020-10-08 00:26:45 +00:00
Rasmus Munk Larsen
b431024404 Don't make assumptions about NaN-propagation for pmin/pmax - it various across platforms.
Change test to only test for NaN-propagation for pfmin/pfmax.
2020-10-07 19:05:18 +00:00
David Tellenbach
f66f3393e3 Use reinterpret_cast instead of C-style cast in Inverse_NEON.h 2020-10-04 00:35:09 +02:00
Rasmus Munk Larsen
22c971a225 Don't cast away const in Inverse_NEON.h. 2020-10-02 15:06:34 -07:00
Rasmus Munk Larsen
f93841b53e Use EIGEN_USING_STD to fix CUDA compilation error on BFloat16.h. 2020-10-02 14:47:15 -07:00
Rasmus Munk Larsen
ee714f79f7 Fix CUDA build breakage and incorrect result for absdiff on HIP with long double arguments. 2020-10-02 21:05:35 +00:00
janos
f7b185a8b1 dont use =* might not return a Scalar 2020-10-02 14:36:51 +02:00
Rasmus Munk Larsen
9078f47cd6 Fix build breakage with MSVC 2019, which does not support MMX intrinsics for 64 bit builds, see:
https://stackoverflow.com/questions/60933486/mmx-intrinsics-like-mm-cvtpd-pi32-not-found-with-msvc-2019-for-64bit-targets-c

Instead use the equivalent SSE2 intrinsics.
2020-10-01 12:37:55 -07:00
Rasmus Munk Larsen
3b445d9bf2 Add a generic packet ops corresponding to {std}::fmin and {std}::fmax. The non-sensical NaN-propagation rules for std::min std::max implemented by pmin and pmax in Eigen is a longstanding source og confusion and bug report. This change is a first step towards addressing it, as discussing in issue #564. 2020-10-01 16:54:31 +00:00
Rasmus Munk Larsen
44b9d4e412 Specialize pldexp_double and pfdexp_double and get rid of Packet2l definition for SSE. SSE does not support conversion between 64 bit integers and double and the existing implementation of casting between Packet2d and Packer2l results in undefined behavior when casting NaN to int. Since pldexp and pfdexp only manipulate exponent fields that fit in 32 bit, this change provides specializations that use existing instructions _mm_cvtpd_pi32 and _mm_cvtsi32_pd instead. 2020-09-30 13:33:44 -07:00
Antonio Sanchez
d5a0d89491 Fix alignedbox 32-bit precision test failure.
The current `test/geo_alignedbox` tests fail on 32-bit arm due to small floating-point errors.

In particular, the following is not guaranteed to hold:
```
IsometryTransform identity = IsometryTransform::Identity();
BoxType transformedC;
transformedC.extend(c.transformed(identity));
VERIFY(transformedC.contains(c));
```
since `c.transformed(identity)` is ever-so-slightly different from `c`. Instead, we replace this test with one that checks an identity transform is within floating-point precision of `c`.

Also updated the condition on `AlignedBox::transform(...)` to only accept `Affine`, `AffineCompact`, and `Isometry` modes explicitly.  Otherwise, invalid combinations of modes would also incorrectly pass the assertion.
2020-09-30 08:42:03 -07:00
David Tellenbach
30960d485e Fix failure in GEBP kernel when compiling with OpenMP and FMA
Fixes #1995
2020-09-30 01:26:07 +02:00
Rasmus Munk Larsen
f9d1500f74 Revert !182. 2020-09-29 13:56:17 -07:00
Rasmus Munk Larsen
068121ec02 Add missing newline at the end of Inverse_NEON.h 2020-09-29 15:32:52 +00:00
Rasmus Munk Larsen
74ff5719b3 Fix compilation of 64 bit constant arguments to pset1frombits in TypeCasting.h on platforms where uint64_t != unsigned long. 2020-09-28 22:47:11 +00:00
Rasmus Munk Larsen
3a0b23e473 Fix compilation of pset1frombits calls on iOS. 2020-09-28 22:30:36 +00:00
Christoph Hertzberg
6b0c0b587e Provide a more efficient Packet2l->Packet2d cast method 2020-09-28 22:14:02 +00:00
Martin Pecka
6425e875a1 Added AlignedBox::transform(AffineTransform). 2020-09-28 18:06:23 +00:00
Alexander Grund
a967fadb21 Make relative path variables of type STRING
When the type is PATH an absolute path is expected and user-defined
values are converted into absolute paths relative to the current directory.

Fixes #1990
2020-09-28 16:39:48 +00:00
Zhuyie
e4b24e7fb2 Fix Eigen::ThreadPool::CurrentThreadId returning wrong thread id when EIGEN_AVOID_THREAD_LOCAL and NDEBUG are defined 2020-09-25 09:36:43 +00:00
Deven Desai
ce5c59729d Fix for ROCm/HIP breakage - 200921
The following commit causes regressions in the ROCm/HIP support for Eigen
e55182ac09

I suspect the same breakages occur on the CUDA side too.

The above commit puts the EIGEN_CONSTEXPR attribute on `half_base` constructor. `half_base` is derived from `__half_raw`.

When compiling with GPU support, the definition of `__half_raw` gets picked up from the GPU Compiler specific header files (`hip_fp16.h`, `cuda_fp16.h`). Properly supporting the above commit would require adding the `constexpr` attribute to the `__half_raw` constructor (and other `*half*` routines) in those header files. While that is something we can explore in the future, for now we need to undo the above commit when compiling with GPU support, which is what this commit does.

This commit also reverts a small change in the `raw_uint16_to_half` routine made by the above commit. Similar to the case above, that change was leading to compile errors due to the fact that `__half_raw` has a different definition when compiling with DPU support.
2020-09-22 22:26:45 +00:00
David Tellenbach
b8a13f13ca Add CI configuration for ppc64le 2020-09-22 00:26:23 +00:00
Guoqiang QI
821702e771 Fix the #issue1997 and #issue1991 bug triggered by unsupport a[index](type a: __i28d) ops with MSVC compiler 2020-09-21 15:49:00 +00:00
David Tellenbach
493a7c773c Remove EIGEN_CONSTEXPR from NumTraits<boost::multiprecision::number<...>> 2020-09-21 12:43:41 +02:00
Павел Мацула
38e4a67394 Fix using FindStandardMathLibrary.cmake with -Wall (-Wunused-value) added to CMAKE_CXX_FLAG 2020-09-19 16:13:16 +00:00
Rasmus Munk Larsen
c4b99f78c7 Fix breakage in pcast<Packet2l, Packet2d> due to _mm_cvtsi128_si64 not being available on 32 bit x86.
If SSE 4.1 is available use the faster _mm_extract_epi64 intrinsic.
2020-09-18 18:13:20 -07:00
guoqiangqi
9aad16b443 Fix undefined reference to pset1frombits bug on different platforms 2020-09-19 00:53:21 +00:00
David Tellenbach
c4aa8e0db2 Rename variable to avoid shadowing of a previously declared one 2020-09-18 22:53:15 +02:00
Rasmus Munk Larsen
e55182ac09 Get rid of initialization logic for blueNorm by making the computed constants static const or constexpr.
Move macro definition EIGEN_CONSTEXPR to Core and make all methods in NumTraits constexpr when EIGEN_HASH_CONSTEXPR is 1.
2020-09-18 17:38:58 +00:00
Rasmus Munk Larsen
14022f5eb5 Fix more mildly embarrassing typos in ARM intrinsics in PacketMath.h.
'vmvnq_u64' does not exist for some reason.
2020-09-18 04:14:13 +00:00
Rasmus Munk Larsen
a5b226920f Fix typo in PacketMath.h 2020-09-18 01:22:23 +00:00
Rasmus Munk Larsen
3af744b023 Add missing packet op pcmp_lt_or_nan for Packet2d on ARM. 2020-09-18 01:07:01 +00:00
Rasmus Munk Larsen
31a6b88ff3 Disable double version of compute_inverse_size4 on Inverse_NEON.h if Packet2d is not supported. 2020-09-17 23:51:06 +00:00
Brad King
880fa43b2b Add support for CastXML on ARM aarch64
CastXML simulates the preprocessors of other compilers, but actually
parses the translation unit with an internal Clang compiler.
Use the same `vld1q_u64` workaround that we do for Clang.

Fixes: #1979
2020-09-16 13:40:23 -04:00
daravi
6f0f6f792e Fix compiler error due to c++20 operator== generation rules 2020-09-16 02:06:53 +00:00
Benoit Jacob
cc0c38ace8 Remove old Clang compiler bug work-arounds. The two LLVM bugs referenced in the comments here have long been fixed. The workarounds were now detrimental because (1) they prevented using fused mul-add on Clang/ARM32 and (2) the unnecessary 'volatile' in 'asm volatile' prevented legitimate reordering by the compiler. 2020-09-15 20:54:14 -04:00
Tim Shen
bb56a62582 Make bfloat16(float(-nan)) produce -nan, not nan. 2020-09-15 13:24:23 -07:00
Guoqiang QI
3012e755e9 Add plog ops support packet2d for NEON 2020-09-15 17:10:35 +00:00
Rasmus Munk Larsen
e4fb0ddf78 Add EIGEN_UNUSED_VARIABLE to unused variable in Memory.h 2020-09-15 01:18:55 +00:00
Pedro Caldeira
65e400896b Fix bfloat16 round on gcc 4.8 2020-09-14 10:43:59 -03:00
Rasmus Munk Larsen
5636f80d11 Fix issue #1968. Don't discard return value from "new" in C++17. 2020-09-13 17:38:45 +00:00
Guoqiang QI
7c5d48f313 Unified sse pldexp_double api 2020-09-12 10:56:55 +00:00
Rasmus Munk Larsen
71e08c702b Make blueNorm threadsafe if C++11 atomics are available. 2020-09-12 01:23:29 +00:00
David Tellenbach
adc861cabd New CI infrastructure, including AArch64 runners 2020-09-11 18:11:49 +00:00
Niels Dekker
5328c9be43 Fix half_impl::float_to_half_rtne(float) warning: '<<' causes overflow
Fixed Visual Studio 2019 Code Analysis (C++ Core Guidelines) warning
C26450 from inside `half_impl::float_to_half_rtne(float)`:
> Arithmetic overflow: '<<' operation causes overflow at compile time.
2020-09-10 16:22:28 +02:00
Pedro Caldeira
35d149e34c Add missing functions for Packet8bf in Altivec architecture.
Including new tests for bfloat16 Packets.
Fix prsqrt on GenericPacketMath.
2020-09-08 09:22:11 -05:00
Guoqiang QI
85428a3440 Add Neon psqrt<Packet2d> and pexp<Packet2d> 2020-09-08 09:04:03 +00:00
Alexander Neumann
5272106826 remove semi triggering -Wextra-semi-stmt 2020-09-07 11:42:30 +02:00
Stephen Zheng
5f25bcf7d6 Add Inverse_NEON.h
Implemented fast size-4 matrix inverse (mimicking Inverse_SSE.h) using NEON intrinsics.

```
Benchmark                   Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------
BM_float                 -0.1285         -0.1275           568           495           572           499
BM_double                -0.2265         -0.2254           638           494           641           496
```
2020-09-04 10:55:47 +00:00
Everton Constantino
6fe88a3c9d MatrixProuct enhancements:
- Changes to Altivec/MatrixProduct
  Adapting code to gcc 10.
  Generic code style and performance enhancements.
  Adding PanelMode support.
  Adding stride/offset support.
  Enabling float64, std::complex and std::complex.
  Fixing lack of symm_pack.
  Enabling mixedtypes.
- Adding std::complex tests to blasutil.
- Adding an implementation of storePacketBlock when Incr!= 1.
2020-09-02 18:21:36 -03:00
Everton Constantino
6568856275 Changing u/int8_t to un/signed char because clang does not understand
it.

Implementing pcmp_eq to Packet8 and Packet16.
2020-09-02 17:02:15 -03:00
Gael Guennebaud
27e6648074 fix #1901: warning in Mode==(Upper|Lower) 2020-09-02 15:43:58 +02:00
Hans Johnson
5b9bfc892a BUG: cmake_minimum_required must be the first command
https://cmake.org/cmake/help/v3.5/command/project.html

Note: Call the cmake_minimum_required() command at the beginning of the
top-level CMakeLists.txt file even before calling the project() command.
It is important to establish version and policy settings before invoking
other commands whose behavior they may affect. See also policy CMP0000.
2020-08-28 22:57:16 +00:00
Chip Kerchner
e5886457c8 Change Packet8s and Packet8us to use vector commands on Power for pmadd, pmul and psub. 2020-08-28 19:27:32 +00:00
Gael Guennebaud
25424d91f6 Fix #1974: assertion when reserving an empty sparse matrix 2020-08-26 12:32:20 +02:00
Guoqiang QI
8bb0febaf9 add psqrt ops support packet2f/packet4f for NEON 2020-08-21 03:17:15 +00:00
Georg Jäger
1b1082334b adding attributes to constructors to support hip-clang on ROCm 3.5 2020-08-20 16:48:11 +02:00
Deven Desai
603e213d13 Fixing a CUDA / P100 regression introduced by PR 181
PR 181 ( https://gitlab.com/libeigen/eigen/-/merge_requests/181 ) adds `__launch_bounds__(1024)` attribute to GPU kernels, that did not have that attribute explicitly specified.

That PR seems to cause regressions on the CUDA platform. This PR/commit makes the changes in PR 181, to be applicable for HIP only
2020-08-20 00:29:57 +00:00
David Tellenbach
c060114a25 Fix nightly CI configuration 2020-08-19 20:52:34 +02:00
David Tellenbach
fe8c3ef3cb Add possibility to split test suit build targets and improved CI configuration
- Introduce CMake option `EIGEN_SPLIT_TESTSUITE` that allows to divide the single test build target into several subtargets
- Add CI pipeline for merge request that can be run by GitLab's shared runners
- Add nightly CI pipeline
2020-08-19 18:27:45 +00:00
Rasmus Munk Larsen
d10b27fe37 Add missing inline keyword in Quaternion.h. 2020-08-14 17:51:04 +00:00
David Tellenbach
d4a727d092 Disable min/max NaN propagation in test cxx11_tensor_expr
The current pmin/pmax implementation for Arm Neon propagate NaNs
differently than std::min/std::max.

See issue https://gitlab.com/libeigen/eigen/-/issues/1937
2020-08-14 16:16:27 +00:00
David Tellenbach
d2bb6cf396 Fix compilation error in blasutil test 2020-08-14 18:15:18 +02:00
David Tellenbach
c6820a6316 Replace the call to int64_t in the blasutil test by explicit types
Some platforms define int64_t to be long long even for C++03. If this is
the case we miss the definition of internal::make_unsigned for this
type. If we just define the template we get duplicated definitions
errors for platforms defining int64_t as signed long for C++03.

We need to find a way to distinguish both cases at compile-time.
2020-08-14 17:24:37 +02:00
David Tellenbach
8ba1b0f41a bfloat16 packetmath for Arm Neon backend 2020-08-13 15:48:40 +00:00
Pedro Caldeira
704798d1df Add support for Bfloat16 to use vector instructions on Altivec
architecture
2020-08-10 13:22:01 -05:00
Deven Desai
46f8a18567 Adding an explicit launch_bounds(1024) attribute for GPU kernels.
Starting with ROCm 3.5, the HIP compiler will change from HCC to hip-clang.

This compiler change introduce a change in the default value of the `__launch_bounds__` attribute associated with a GPU kernel. (default value means the value assumed by the compiler as the `__launch_bounds attribute__` value, when it is not explicitly specified by the user)

Currently (i.e. for HIP with ROCm 3.3 and older), the default value is 1024. That changes to 256 with ROCm 3.5 (i.e. hip-clang compiler). As a consequence of this change, if a GPU kernel with a `__luanch_bounds__` attribute of 256 is launched at runtime with a threads_per_block value > 256, it leads to a runtime error. This is leading to a couple of Eigen unit test failures with ROCm 3.5.

This commit adds an explicit `__launch_bounds(1024)__` attribute to every GPU kernel that currently does not have it explicitly specified (and hence will end up getting the default value of 256 with the change to hip-clang)
2020-08-05 01:46:34 +00:00
Zachary Garrett
21122498ec Temporarily turn off the NEON implementation of pfloor as it does not work for large values.
The NEON implementation mimics the SSE implementation, but didn't mention the caveat that due to the unsigned of signed integer conversions, not all values in the original floating point  represented are supported.
2020-08-04 16:28:23 +00:00
David Tellenbach
23b7f0572b Disable CI buildstage again 2020-08-03 15:41:43 +02:00
Gael Guennebaud
d0f5d4bc50 add a banner to advertise the survey 2020-07-29 19:01:38 +02:00
David Tellenbach
5e484fa11d Fix StlDeque for GCC 10
StlDeque extends std::deque by accessing some of its internal members.
Since GCC 10 these are not accessible anymore.
2020-07-29 12:31:13 +00:00
Teng Lu
3ec4f0b641 Fix undefine BF16 union behavior in AVX512. 2020-07-29 02:20:21 +00:00
Rasmus Munk Larsen
b92206676c Inherit alignment trait from argument in TensorBroadcasting to avoid segfault when the argument is unaligned. 2020-07-28 19:19:37 +00:00
David Tellenbach
99da2e1a8d Fix clang-tidy warnings in generic bfloat16 implementation
See !172 for related discussions.
2020-07-27 16:00:24 +02:00
qxxxb
649fd1c2ae Fix CMake install command 2020-07-25 16:35:13 -04:00
David Tellenbach
e48d8e4725 Don't allow failure for CI build stage anymore 2020-07-24 21:12:15 +02:00
David Tellenbach
b8ca93842c Improve CI configuration
- Fix docker Fedora image to Fedora:31
  - Fix gcc version to gcc-9.2.1
  - Use GitLab CI dag
  - Fix usage of build cache
  - Introduce build artificats
2020-07-24 15:58:44 +00:00
Gael Guennebaud
fb0c6868ad Add missing footer declaration 2020-07-24 10:28:44 +02:00
David Tellenbach
c1ffe452fc Fix bfloat16 casts
If we have explicit conversion operators available (C++11) we define
explicit casts from bfloat16 to other types. If not (C++03), we don't
define conversion operators but rely on implicit conversion chains from
bfloat16 over float to other types.
2020-07-23 20:55:06 +00:00
Gael Guennebaud
2ce2f51989 remove piwik tracker 2020-07-23 13:51:39 +02:00
Rasmus Munk Larsen
1b84f21e32 Revert change that made conversion from bfloat16 to {float, double} implicit.
Add roundtrip tests for casting between bfloat16 and complex types.
2020-07-22 18:09:00 -07:00
David Tellenbach
38b91f256b Fix cast of blfoat16 to std::complex<T>
This fixes https://gitlab.com/libeigen/eigen/-/issues/1951
2020-07-22 19:00:17 +00:00
Rasmus Munk Larsen
bed7fbe854 Make sure we take the little-endian path if __BYTE_ORDER__ is not defined. 2020-07-22 18:54:38 +00:00
Niels Dekker
0e1a33a461 Faster conversion from integer types to bfloat16
Specialized `bfloat16_impl::float_to_bfloat16_rtne(float)` for normal floating point numbers, infinity and zero, in order to improve the performance of `bfloat16::bfloat16(const T&)` for integer argument types.

A reduction of more than 20% of the runtime duration of conversion from int to bfloat16 was observed, using Visual C++ 2019 on Windows 10.
2020-07-22 19:25:49 +02:00
Rasmus Munk Larsen
acab22c205 Avoid division by zero in nonZerosEstimate() for empty blocks. 2020-07-22 01:38:30 +00:00
Rasmus Munk Larsen
ac2eca6b11 Update tensor reduction test to avoid undefined division of bfloat16 by int. 2020-07-22 00:35:51 +00:00
Rasmus Munk Larsen
0aeaf5f451 Make numext::as_uint a device function. 2020-07-22 00:33:41 +00:00
Alexander Turkin
60faa9f897 user-defined copy operations removed in favor of compiler-generated ones 2020-07-20 14:59:35 +03:00
Niels Dekker
b11f817bcf Avoid undefined behavior by union type punning in float_to_bfloat16_rtne
Use `numext::as_uint`, instead of union based type punning, to avoid undefined behavior.
See also C++ Core Guidelines: "Don't use a union for type punning"
https://github.com/isocpp/CppCoreGuidelines/blob/v0.8/CppCoreGuidelines.md#c183-dont-use-a-union-for-type-punning

`numext::as_uint` was suggested by David Tellenbach
2020-07-14 19:55:20 +02:00
Sheng Yang
56b3e3f3f8 AVX path for BF16 2020-07-14 01:34:03 +00:00
Niels Dekker
4ab32e2de2 Allow implicit conversion from bfloat16 to float and double
Conversion from `bfloat16` to `float` and `double` is lossless. It seems natural to allow the conversion to be implicit, as the C++ language also support implicit conversion from a smaller to a larger floating point type.

Intel's OneDLL bfloat16 implementation also has an implicit `operator float()`: https://github.com/oneapi-src/oneDNN/blob/v1.5/src/common/bfloat16.hpp
2020-07-11 13:32:28 +02:00
Rasmus Munk Larsen
dcf7655b3d Guard operator<< test by EIGEN_NO_IO. 2020-07-09 19:54:48 +00:00
Rasmus Munk Larsen
ed00df445d Guard operator<< by EIGEN_NO_IO. 2020-07-09 19:52:44 +00:00
Rasmus Munk Larsen
fb77b7288c Add operator<< to print a quaternion. 2020-07-09 12:49:58 -07:00
David Tellenbach
ee4715ff48 Fix test basic stuff
- Guard fundamental types that are not available pre C++11
- Separate subsequent angle brackets >> by spaces
- Allow casting of Eigen::half and Eigen::bfloat16 to complex types
2020-07-09 17:24:00 +00:00
Forrest Voight
8889a2c1c6 Add operator==/operator!= to Quaternion. Fixes #1876. 2020-07-07 20:16:54 +00:00
Rasmus Munk Larsen
6964ae8d52 Change the sign operator in Eigen to return NaN for NaN arguments, not zero. 2020-07-07 01:54:04 +00:00
David Tellenbach
cb63153183 Make test packetmath C++98 compliant 2020-07-01 20:41:59 +02:00
Sheng Yang
116c5235ac BF16 for scalar_cmp_with_cast_op 2020-07-01 18:33:42 +00:00
Kan Chen
8731452b97 Delete duplicate test cases in vectorization_logic.cpp 2020-07-01 00:51:15 +00:00
Antonio Sanchez
9cb8771e9c Fix tensor casts for large packets and casts to/from std::complex
The original tensor casts were only defined for
`SrcCoeffRatio`:`TgtCoeffRatio` 1:1, 1:2, 2:1, 4:1. Here we add the
missing 1:N and 8:1.

We also add casting `Eigen::half` to/from `std::complex<T>`, which
was missing to make it consistent with `Eigen:bfloat16`, and
generalize the overload to work for any complex type.

Tests were added to `basicstuff`, `packetmath`, and
`cxx11_tensor_casts` to test all cast configurations.
2020-06-30 18:53:55 +00:00
Antonio Sanchez
145e51516f Fix denormal check pre c++11.
`float_denorm_style` is an old-style `enum`, so the `denorm_present`
symbol only exists in the `std` namespace prior to c++11.
2020-06-30 17:28:30 +00:00
David Tellenbach
689b57070d Report custom C++ flags in CMake testing summary 2020-06-30 17:18:54 +00:00
David Tellenbach
f3b8d441f6 Remote CI tags to enable shared runners 2020-06-29 22:15:41 +02:00
Christoph Grüninger
dc0b81fb1d Pass CMAKE_MAKE_PROGRAM to Fortran language support test
Otherwise the Make (or Ninja) program is used, which is
installed system wide.
2020-06-27 23:52:38 +02:00
David Tellenbach
13d25f5ed8 Add initial CI configuration file.
The initial CI configuration consists of jobs to build and run tests and
to build docs.
2020-06-27 00:03:35 +00:00
Antonio Sanchez
7222f0b6b5 Fix packetmath_1 float tests for arm/aarch64.
Added missing `pmadd<Packet2f>` for NEON. This leads to significant
improvement in precision than previous `pmul+padd`, which was causing
the `pcos` tests to fail. Also added an approx test with
`std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
pass.

Modified `log(denorm)` tests.  Denorms are not always supported by all
systems (returns `::min`), are always flushed to zero on 32-bit arm,
and configurably flush to zero on sse/avx/aarch64. This leads to
inconsistent results across different systems (i.e. `-inf` vs `nan`).
Added a check for existence and exclude ARM.

Removed logistic exactness test, since scalar and vectorized versions
follow different code-paths due to differences in `pexp` and `pmadd`,
which result in slightly different values. For example, exactness always
fails on arm, aarch64, and altivec.
2020-06-24 14:03:35 -07:00
Simon Pfreundschuh
14f84978e8 Replaced call to deprecated 'load' function with appropriate call to 'on'. 2020-06-23 11:23:13 +02:00
Antonio Sanchez
ff4e7a0820 Add missing Packet2l/Packet2ul ops for NEON.
The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
`pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
`Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
in clang. Here we add and test the missing ops.

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"

$ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-22 11:24:43 -07:00
Antonio Sanchez
03ebdf6acb Added missing NEON pcasts, update packetmath tests.
The NEON `pcast` operators are all implemented and tested for existing
packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
between `int64_t` and `int8_t` in `GenericPacketMath.h`.

Removed incorrect `HasHalfPacket`  definition for NEON's
`Packet2l`/`Packet2ul`.

Adjustments were also made to the `packetmath` tests. These include
- minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
  packets that are vectorizable)
- added 8:1 cast tests
- random number generation
  - original had uninteresting 0 to 0 casts for many casts between
    floating-point and integers, and exhibited signed overflow
    undefined behavior

Tested:
```
$ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
$ adb push packetmath /data/local/tmp/
$ adb shell "/data/local/tmp/packetmath"
```
2020-06-21 09:32:31 -07:00
Teng Lu
386d809bde Support BFloat16 in Eigen 2020-06-20 19:16:24 +00:00
Rasmus Munk Larsen
6b9c92fe7e Add Apache 2.0 license text in COPYING.APACHE. 2020-06-18 12:45:27 -07:00
Nicolas Mellado
cf7adf3a5d Update things you can do message using cmake commands
Print cmake commands instead of make commands, which should work for any generator.
2020-06-16 21:04:33 +00:00
Ilya Tokar
231ce21535 Run two independent chains, when reducing tensors.
Running two chains exposes more instruction level parallelism,
by allowing to execute both chains at the same time.

Results are a bit noisy, but for medium length we almost hit
theoretical upper bound of 2x.

BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
2020-06-16 15:55:11 -04:00
Pedro Caldeira
a475bf14d4 Fix pscatter and pgather for Altivec Complex double 2020-06-16 16:41:02 -03:00
David Tellenbach
c6c84ed961 Fix unused variable warning on Arm 2020-06-15 00:14:58 +02:00
Sebastien Boisvert
6228f27234 Fix #1818: SparseLU: add methods nnzL() and nnzU()
Now this compiles without errors:

$ clang++ -I ../../ test_sparseLU.cpp -std=c++03
2020-06-11 23:49:49 +00:00
Sebastien Boisvert
39cbd6578f Fix #1911: add benchmark for move semantics with fixed-size matrix
$ clang++ -O3 bench/bench_move_semantics.cpp -I. -std=c++11 \
        -o bench_move_semantics

$ ./bench_move_semantics
float copy semantics: 1755.97 ms
float move semantics: 55.063 ms
double copy semantics: 2457.65 ms
double move semantics: 55.034 ms
2020-06-11 23:43:25 +00:00
Antonio Sanchez
a7d2552af8 Remove HasCast and fix packetmath cast tests.
The use of the `packet_traits<>::HasCast` field is currently inconsistent with
`type_casting_traits<>`, and is unused apart from within
`test/packetmath.cpp`. In addition, those packetmath cast tests do not
currently reflect how casts are performed in practice: they ignore the
`SrcCoeffRatio` and `TgtCoeffRatio` fields, assuming a 1:1 ratio.

Here we remove the unsed `HasCast`, and modify the packet cast tests to
better reflect their usage.
2020-06-11 17:26:56 +00:00
Sebastien Boisvert
463ec86648 Fix #1757: remove the word 'suicide' 2020-06-11 00:56:54 +00:00
ShengYang1
b5d66b5e73 Implement scalar_cmp_with_cast_op 2020-06-09 08:12:07 +08:00
Rasmus Munk Larsen
c4059ffcb6 Fix static analyzer warning in SelfadjointProduct.h.
Fix compiler warnings in GeneralBlockPanelKernel.h.
2020-06-08 11:48:44 -07:00
Thales Sabino
1fcaaf460f Update FindComputeCpp.cmake to fix build problems on Windows
- Use standard types in SYCL/PacketMath.h to avoid compilation problems on Windows
- Add EIGEN_HAS_CONSTEXPR to cxx11_tensor_argmax_sycl.cpp to fix build problems on Windows
2020-06-05 20:51:20 +00:00
David Tellenbach
3ce18d3c8f Revert ".gitlab-ci.yml: initial commit"
This reverts commit 95177362ed to
disable GitLab CI temporarily.
2020-06-05 22:43:49 +02:00
Rasmus Munk Larsen
c2ab36f47a Fix broken packetmath test for logistic on Arm. 2020-06-04 16:24:47 -07:00
Rasmus Munk Larsen
537e2b322f Fix typo in previous update to generic predux_any. 2020-06-04 22:25:05 +00:00
Rasmus Munk Larsen
fdc1cbdce3 Avoid implicit float equality comparison in generic predux_any, but use numext::not_equal_strict to avoid breaking builds that compile with -Werror=float-equal. 2020-06-04 22:15:56 +00:00
Rasmus Munk Larsen
daf9bbeca2 Fix compilation error in logistic packet op. 2020-06-03 00:57:41 +00:00
n0mend
6d2a9a524b Update run instructions for benchCholesky 2020-06-01 18:31:46 +00:00
Gael Guennebaud
029a76e115 Bug #1777: make the scalar and packet path consistent for the logistic function + respective unit test 2020-05-31 00:53:37 +02:00
Gael Guennebaud
99b7f7cb9c Fix #556: warnings with mingw 2020-05-31 00:39:44 +02:00
Gael Guennebaud
72782d13e0 Bug #1767: increase required cmake version to 3.5.0 2020-05-31 00:31:09 +02:00
Gael Guennebaud
867a756509 Fix #1833: compilation issue of "array!=scalar" with c++20 2020-05-30 23:53:58 +02:00
Gael Guennebaud
ab615e4114 Save one extra temporary when assigning a sparse product to a row-major sparse matrix 2020-05-30 23:15:12 +02:00
Christoph Junghans
95177362ed .gitlab-ci.yml: initial commit 2020-05-29 09:23:25 -06:00
Kan Chen
8d1302f566 Add support for PacketBlock<Packet8s,4> and PacketBlock<Packet16uc,4> ptranspose on NEON 2020-05-29 00:33:45 +00:00
Antonio Sánchez
8719b9c5bc Disable test for 32-bit systems (e.g. ARM, i386)
Both i386 and 32-bit ARM do not define __uint128_t. On most systems, if
__uint128_t is defined, then so is the macro __SIZEOF_INT128__.

https://stackoverflow.com/questions/18531782/how-to-know-if-uint128-t-is-defined1
2020-05-28 17:40:15 +00:00
Yong Tang
8e1df5b082 Fix incorrect usage of if defined(EIGEN_ARCH_PPC) => if EIGEN_ARCH_PPC
This PR tries to fix an incorrect usage of `if defined(EIGEN_ARCH_PPC)`
in `Eigen/Core` header.

In `Eigen/src/Core/util/Macros.h`, EIGEN_ARCH_PPC was explicitly defined
as either 0 or 1. As a result `if defined(EIGEN_ARCH_PPC)` will always be true.
This causes issues when building on non PPC platform and `MatrixProduct.h` is not
available.

This fix changes `if defined(EIGEN_ARCH_PPC)` => `if EIGEN_ARCH_PPC`.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2020-05-28 05:53:44 -07:00
Kan Chen
4e7046063b Fix #1874: it works on both MSVC 2017 and other platforms. 2020-05-21 18:42:56 +08:00
Pedro Caldeira
2d67af2d2b Add pscatter for Packet16{u}c (int8) 2020-05-20 17:29:34 -03:00
David Tellenbach
5328cd62b3 Guard usage of decltype since it's a C++11 feature
This fixes https://gitlab.com/libeigen/eigen/-/issues/1897
2020-05-20 16:04:16 +02:00
Rasmus Munk Larsen
cc86a31e20 Add guard around specialization for bool, which is only currently implemented for SSE. 2020-05-19 16:21:56 -07:00
Everton Constantino
8a7f360ec3 - Vectorizing MMA packing.
- Optimizing MMA kernel.
- Adding PacketBlock store to blas_data_mapper.
2020-05-19 19:24:11 +00:00
Rasmus Munk Larsen
a145e4adf5 Add newline at the end of StlIterators.h. 2020-05-15 20:36:00 +00:00
Gael Guennebaud
8ce9630ddb Fix #1874: workaround MSVC 2017 compilation issue. 2020-05-15 20:47:32 +02:00
Rasmus Munk Larsen
9b411757ab Add missing packet ops for bool, and make it pass the same packet op unit tests as other arithmetic types.
This change also contains a few minor cleanups:
  1. Remove packet op pnot, which is not needed for anything other than pcmp_le_or_nan,
     which can be done in other ways.
  2. Remove the "HasInsert" enum, which is no longer needed since we removed the
     corresponding packet ops.
  3. Add faster pselect op for Packet4i when SSE4.1 is supported.

Among other things, this makes the fast transposeInPlace() method available for Matrix<bool>.

Run on ************** (72 X 2994 MHz CPUs); 2020-05-09T10:51:02.372347913-07:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                        Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------------
BM_TransposeInPlace<float>/4            9.77           9.77    71670320
BM_TransposeInPlace<float>/8           21.9           21.9     31929525
BM_TransposeInPlace<float>/16          66.6           66.6     10000000
BM_TransposeInPlace<float>/32         243            243        2879561
BM_TransposeInPlace<float>/59         844            844         829767
BM_TransposeInPlace<float>/64         933            933         750567
BM_TransposeInPlace<float>/128       3944           3945         177405
BM_TransposeInPlace<float>/256      16853          16853          41457
BM_TransposeInPlace<float>/512     204952         204968           3448
BM_TransposeInPlace<float>/1k     1053889        1053861            664
BM_TransposeInPlace<bool>/4            14.4           14.4     48637301
BM_TransposeInPlace<bool>/8            36.0           36.0     19370222
BM_TransposeInPlace<bool>/16           31.5           31.5     22178902
BM_TransposeInPlace<bool>/32          111            111        6272048
BM_TransposeInPlace<bool>/59          626            626        1000000
BM_TransposeInPlace<bool>/64          428            428        1632689
BM_TransposeInPlace<bool>/128        1677           1677         417377
BM_TransposeInPlace<bool>/256        7126           7126          96264
BM_TransposeInPlace<bool>/512       29021          29024          24165
BM_TransposeInPlace<bool>/1k       116321         116330           6068
2020-05-14 22:39:13 +00:00
Felipe Attanasio
d640276d31 Added support for reverse iterators for Vectorwise operations. 2020-05-14 22:38:20 +00:00
Christopher Moore
fa8fd4b4d5 Indexed view should have RowMajorBit when there is staticly a single row 2020-05-14 22:11:19 +00:00
Christopher Moore
a187ffea28 Resolve "IndexedView of a vector should allow linear access" 2020-05-13 19:24:42 +00:00
Mark Eberlein
ba9d18b938 Add KLU support to spbenchsolver 2020-05-11 21:50:27 +00:00
Pedro Caldeira
5fdc179241 Altivec template functions to better code reusability 2020-05-11 21:04:51 +00:00
mehdi-goli
d3e81db6c5 Eigen moved the scanLauncehr function inside the internal namespace.
This commit applies the following changes:
    - Moving the `scamLauncher` specialization inside internal namespace to fix compiler crash on TensorScan for SYCL backend.
    - Replacing  `SYCL/sycl.hpp` to `CL/sycl.hpp` in order to follow SYCL 1.2.1 standard.
    - minor fixes: commenting out an unused variable to avoid compiler warnings.
2020-05-11 16:10:33 +01:00
Rasmus Munk Larsen
c1d944dd91 Remove packet ops pinsertfirst and pinsertlast that are only used in a single place, and can be replaced by other ops when constructing the first/final packet in linspaced_op_impl::packetOp.
I cannot measure any performance changes for SSE, AVX, or AVX512.

name                                 old time/op             new time/op             delta
BM_LinSpace<float>/1                 1.63ns ± 0%             1.63ns ± 0%   ~             (p=0.762 n=5+5)
BM_LinSpace<float>/8                 4.92ns ± 3%             4.89ns ± 3%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/64                34.6ns ± 0%             34.6ns ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/512                217ns ± 0%              217ns ± 0%   ~             (p=0.421 n=5+5)
BM_LinSpace<float>/4k                1.68µs ± 0%             1.68µs ± 0%   ~             (p=1.000 n=5+5)
BM_LinSpace<float>/32k               13.3µs ± 0%             13.3µs ± 0%   ~             (p=0.905 n=5+4)
BM_LinSpace<float>/256k               107µs ± 0%              107µs ± 0%   ~             (p=0.841 n=5+5)
BM_LinSpace<float>/1M                 427µs ± 0%              427µs ± 0%   ~             (p=0.690 n=5+5)
2020-05-08 15:41:50 -07:00
David Tellenbach
5c4e19fbe7 Possibility to specify user-defined default cache sizes for GEBP kernel
Some architectures have no convinient way to determine cache sizes at
runtime. Eigen's GEBP kernel falls back to default cache values in this
case which might not be correct in all situations.

This patch introduces three preprocessor directives

  `EIGEN_DEFAULT_L1_CACHE_SIZE`
  `EIGEN_DEFAULT_L2_CACHE_SIZE`
  `EIGEN_DEFAULT_L3_CACHE_SIZE`

to give users the possibility to set these default values explicitly.
2020-05-08 12:54:36 +02:00
Rasmus Munk Larsen
225ab040e0 Remove unused packet op "palign".
Clean up a compiler warning in c++03 mode in AVX512/Complex.h.
2020-05-07 17:14:26 -07:00
Rasmus Munk Larsen
74ec8e6618 Make size odd for transposeInPlace test to make sure we hit the scalar path. 2020-05-07 17:29:56 +00:00
Rasmus Munk Larsen
49f1aeb60d Remove traits declaring NEON vectorized casts that do not actually have packet op implementations. 2020-05-07 09:49:22 -07:00
Rasmus Munk Larsen
2fd8a5a08f Add parallelization of TensorScanOp for types without packet ops.
Clean up the code a bit and do a few micro-optimizations to improve performance for small tensors.

Benchmark numbers for Tensor<uint32_t>:

name                                                       old time/op             new time/op             delta
BM_cumSumRowReduction_1T/8   [using 1 threads]             76.5ns ± 0%             61.3ns ± 4%    -19.80%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/64  [using 1 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.77%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/256 [using 1 threads]             39.8µs ± 0%             39.6µs ± 0%     -0.60%          (p=0.008 n=5+5)
BM_cumSumRowReduction_1T/4k  [using 1 threads]             13.9ms ± 0%             13.4ms ± 1%     -4.19%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/8   [using 2 threads]             76.8ns ± 0%             59.1ns ± 0%    -23.09%          (p=0.016 n=5+4)
BM_cumSumRowReduction_2T/64  [using 2 threads]             2.47µs ± 1%             2.41µs ± 1%     -2.53%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/256 [using 2 threads]             39.8µs ± 0%             34.7µs ± 6%    -12.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_2T/4k  [using 2 threads]             13.8ms ± 1%              7.2ms ± 6%    -47.74%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/8   [using 8 threads]             76.4ns ± 0%             61.8ns ± 3%    -19.02%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/64  [using 8 threads]             2.47µs ± 1%             2.40µs ± 1%     -2.84%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/256 [using 8 threads]             39.8µs ± 0%             28.3µs ±11%    -28.75%          (p=0.008 n=5+5)
BM_cumSumRowReduction_8T/4k  [using 8 threads]             13.8ms ± 0%              2.7ms ± 5%    -80.39%          (p=0.008 n=5+5)
BM_cumSumColReduction_1T/8   [using 1 threads]             59.1ns ± 0%             80.3ns ± 0%    +35.94%          (p=0.029 n=4+4)
BM_cumSumColReduction_1T/64  [using 1 threads]             3.06µs ± 0%             3.08µs ± 1%       ~             (p=0.114 n=4+4)
BM_cumSumColReduction_1T/256 [using 1 threads]              175µs ± 0%              176µs ± 0%       ~             (p=0.190 n=4+5)
BM_cumSumColReduction_1T/4k  [using 1 threads]              824ms ± 1%              844ms ± 1%     +2.37%          (p=0.008 n=5+5)
BM_cumSumColReduction_2T/8   [using 2 threads]             59.0ns ± 0%             90.7ns ± 0%    +53.74%          (p=0.029 n=4+4)
BM_cumSumColReduction_2T/64  [using 2 threads]             3.06µs ± 0%             3.10µs ± 0%     +1.08%          (p=0.016 n=4+5)
BM_cumSumColReduction_2T/256 [using 2 threads]              176µs ± 0%              189µs ±18%       ~             (p=0.151 n=5+5)
BM_cumSumColReduction_2T/4k  [using 2 threads]              836ms ± 2%              611ms ±14%    -26.92%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/8   [using 8 threads]             59.3ns ± 2%             90.6ns ± 0%    +52.79%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/64  [using 8 threads]             3.07µs ± 0%             3.10µs ± 0%     +0.99%          (p=0.016 n=5+4)
BM_cumSumColReduction_8T/256 [using 8 threads]              176µs ± 0%               80µs ±19%    -54.51%          (p=0.008 n=5+5)
BM_cumSumColReduction_8T/4k  [using 8 threads]              827ms ± 2%              180ms ±14%    -78.24%          (p=0.008 n=5+5)
2020-05-06 14:48:37 -07:00
Rasmus Munk Larsen
0e59f786e1 Fix accidental copy of loop variable. 2020-05-05 21:35:38 +00:00
Rasmus Munk Larsen
7b76c85daf Vectorize and parallelize TensorScanOp.
TensorScanOp is used in TensorFlow for a number of operations, such as cumulative logexp reduction and cumulative sum and product reductions.

The benchmarks numbers below are for cumulative row- and column reductions of NxN matrices.

name                                                         old time/op             new time/op     delta
BM_cumSumRowReduction_1T/4    [using 1 threads ]             25.1ns ± 1%             35.2ns ± 1%    +40.45%
BM_cumSumRowReduction_1T/8    [using 1 threads ]             73.4ns ± 0%             82.7ns ± 3%    +12.74%
BM_cumSumRowReduction_1T/32   [using 1 threads ]              988ns ± 0%              832ns ± 0%    -15.77%
BM_cumSumRowReduction_1T/64   [using 1 threads ]             4.07µs ± 2%             3.47µs ± 0%    -14.70%
BM_cumSumRowReduction_1T/128  [using 1 threads ]             18.0µs ± 0%             16.8µs ± 0%     -6.58%
BM_cumSumRowReduction_1T/512  [using 1 threads ]              287µs ± 0%              281µs ± 0%     -2.22%
BM_cumSumRowReduction_1T/2k   [using 1 threads ]             4.78ms ± 1%             4.78ms ± 2%       ~
BM_cumSumRowReduction_1T/10k  [using 1 threads ]              117ms ± 1%              117ms ± 1%       ~
BM_cumSumRowReduction_8T/4    [using 8 threads ]             25.0ns ± 0%             35.2ns ± 0%    +40.82%
BM_cumSumRowReduction_8T/8    [using 8 threads ]             77.2ns ±16%             81.3ns ± 0%       ~
BM_cumSumRowReduction_8T/32   [using 8 threads ]              988ns ± 0%              833ns ± 0%    -15.67%
BM_cumSumRowReduction_8T/64   [using 8 threads ]             4.08µs ± 2%             3.47µs ± 0%    -14.95%
BM_cumSumRowReduction_8T/128  [using 8 threads ]             18.0µs ± 0%             17.3µs ±10%       ~
BM_cumSumRowReduction_8T/512  [using 8 threads ]              287µs ± 0%               58µs ± 6%    -79.92%
BM_cumSumRowReduction_8T/2k   [using 8 threads ]             4.79ms ± 1%             0.64ms ± 1%    -86.58%
BM_cumSumRowReduction_8T/10k  [using 8 threads ]              117ms ± 1%               18ms ± 6%    -84.50%

BM_cumSumColReduction_1T/4    [using 1 threads ]             23.9ns ± 0%             33.4ns ± 1%    +39.68%
BM_cumSumColReduction_1T/8    [using 1 threads ]             71.6ns ± 1%             49.1ns ± 3%    -31.40%
BM_cumSumColReduction_1T/32   [using 1 threads ]              973ns ± 0%              165ns ± 2%    -83.10%
BM_cumSumColReduction_1T/64   [using 1 threads ]             4.06µs ± 1%             0.57µs ± 1%    -85.94%
BM_cumSumColReduction_1T/128  [using 1 threads ]             33.4µs ± 1%              4.1µs ± 1%    -87.67%
BM_cumSumColReduction_1T/512  [using 1 threads ]             1.72ms ± 4%             0.21ms ± 5%    -87.91%
BM_cumSumColReduction_1T/2k   [using 1 threads ]              119ms ±53%               11ms ±35%    -90.42%
BM_cumSumColReduction_1T/10k  [using 1 threads ]              1.59s ±67%              0.35s ±49%    -77.96%
BM_cumSumColReduction_8T/4    [using 8 threads ]             23.8ns ± 0%             33.3ns ± 0%    +40.06%
BM_cumSumColReduction_8T/8    [using 8 threads ]             71.6ns ± 1%             49.2ns ± 5%    -31.33%
BM_cumSumColReduction_8T/32   [using 8 threads ]             1.01µs ±12%             0.17µs ± 3%    -82.93%
BM_cumSumColReduction_8T/64   [using 8 threads ]             4.15µs ± 4%             0.58µs ± 1%    -86.09%
BM_cumSumColReduction_8T/128  [using 8 threads ]             33.5µs ± 0%              4.1µs ± 4%    -87.65%
BM_cumSumColReduction_8T/512  [using 8 threads ]             1.71ms ± 3%             0.06ms ±16%    -96.21%
BM_cumSumColReduction_8T/2k   [using 8 threads ]             97.1ms ±14%              3.0ms ±23%    -96.88%
BM_cumSumColReduction_8T/10k  [using 8 threads ]              1.97s ± 8%              0.06s ± 2%    -96.74%
2020-05-05 00:19:43 +00:00
Xiaoxiang Cao
a74a278abd Fix confusing template param name for Stride fwd decl. 2020-04-30 01:43:05 +00:00
Rasmus Munk Larsen
923ee9aba3 Fix the embarrassingly incomplete fix to the embarrassing bug in blocked transpose. 2020-04-29 17:27:36 +00:00
Rasmus Munk Larsen
a32923a439 Fix (embarrassing) bug in blocked transpose. 2020-04-29 17:02:27 +00:00
Rasmus Munk Larsen
1e41406c36 Add missing transpose in cleanup loop. Without it, we trip an assertion in debug mode. 2020-04-29 01:30:51 +00:00
Rasmus Munk Larsen
fbe7916c55 Fix compilation error with Clang on Android: _mm_extract_epi64 fails to compile. 2020-04-29 00:58:41 +00:00
Clément Grégoire
82f54ad144 Fix perf monitoring merge function 2020-04-28 17:02:59 +00:00
Rasmus Munk Larsen
ab773c7e91 Extend support for Packet16b:
* Add ptranspose<*,4> to support matmul and add unit test for Matrix<bool> * Matrix<bool>
* work around a bug in slicing of Tensor<bool>.
* Add tensor tests

This speeds up matmul for boolean matrices by about 10x

name                            old time/op             new time/op             delta
BM_MatMul<bool>/8                267ns ± 0%              479ns ± 0%  +79.25%          (p=0.008 n=5+5)
BM_MatMul<bool>/32              6.42µs ± 0%             0.87µs ± 0%  -86.50%          (p=0.008 n=5+5)
BM_MatMul<bool>/64              43.3µs ± 0%              5.9µs ± 0%  -86.42%          (p=0.008 n=5+5)
BM_MatMul<bool>/128              315µs ± 0%               44µs ± 0%  -85.98%          (p=0.008 n=5+5)
BM_MatMul<bool>/256             2.41ms ± 0%             0.34ms ± 0%  -85.68%          (p=0.008 n=5+5)
BM_MatMul<bool>/512             18.8ms ± 0%              2.7ms ± 0%  -85.53%          (p=0.008 n=5+5)
BM_MatMul<bool>/1k               149ms ± 0%               22ms ± 0%  -85.40%          (p=0.008 n=5+5)
2020-04-28 16:12:47 +00:00
Rasmus Munk Larsen
b47c777993 Block transposeInPlace() when the matrix is real and square. This yields a large speedup because we transpose in registers (or L1 if we spill), instead of one packet at a time, which in the worst case makes the code write to the same cache line PacketSize times instead of once.
rmlarsen@rmlarsen4:.../eigen_bench/google3$ benchy --benchmarks=.*TransposeInPlace.*float.* --reference=srcfs experimental/users/rmlarsen/bench:matmul_bench
 10 / 10 [====================================================================================================================================================================================================================] 100.00% 2m50s
(Generated by http://go/benchy. Settings: --runs 5 --benchtime 1s --reference "srcfs" --benchmarks ".*TransposeInPlace.*float.*" experimental/users/rmlarsen/bench:matmul_bench)

name                                       old time/op             new time/op             delta
BM_TransposeInPlace<float>/4               9.84ns ± 0%             6.51ns ± 0%  -33.80%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/8               23.6ns ± 1%             17.6ns ± 0%  -25.26%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/16              78.8ns ± 0%             60.3ns ± 0%  -23.50%          (p=0.029 n=4+4)
BM_TransposeInPlace<float>/32               302ns ± 0%              229ns ± 0%  -24.40%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/59              1.03µs ± 0%             0.84µs ± 1%  -17.87%          (p=0.016 n=5+4)
BM_TransposeInPlace<float>/64              1.20µs ± 0%             0.89µs ± 1%  -25.81%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/128             8.96µs ± 0%             3.82µs ± 2%  -57.33%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/256              152µs ± 3%               17µs ± 2%  -89.06%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/512              837µs ± 1%              208µs ± 0%  -75.15%          (p=0.008 n=5+5)
BM_TransposeInPlace<float>/1k              4.28ms ± 2%             1.08ms ± 2%  -74.72%          (p=0.008 n=5+5)
2020-04-28 16:08:16 +00:00
Pedro Caldeira
29f0917a43 Add support to vector instructions to Packet16uc and Packet16c 2020-04-27 12:48:08 -03:00
Rasmus Munk Larsen
e80ec24357 Remove unused packet op "preduxp". 2020-04-23 18:17:14 +00:00
René Wagner
0aebe19aca BooleanRedux.h: Add more EIGEN_DEVICE_FUNC qualifiers.
This enables operator== on Eigen matrices in device code.
2020-04-23 17:25:08 +02:00
Eugene Zhulenev
3c02fefec5 Add async evaluation support to TensorSlicingOp.
Device::memcpy is not async-safe and might lead to deadlocks. Always evaluate slice expression in async mode.
2020-04-22 19:55:01 +00:00
Pedro Caldeira
0c67b855d2 Add Packet8s and Packet8us to support signed/unsigned int16/short Altivec vector operations 2020-04-21 14:52:46 -03:00
Rasmus Munk Larsen
e8f40e4670 Fix bug in ptrue for Packet16b. 2020-04-20 21:45:10 +00:00
Rasmus Munk Larsen
2f6ddaa25c Add partial vectorization for matrices and tensors of bool. This speeds up boolean operations on Tensors by up to 25x.
Benchmark numbers for the logical and of two NxN tensors:

name                                               old time/op             new time/op             delta
BM_booleanAnd_1T/3   [using 1 threads]             14.6ns ± 0%             14.4ns ± 0%   -0.96%
BM_booleanAnd_1T/4   [using 1 threads]             20.5ns ±12%              9.0ns ± 0%  -56.07%
BM_booleanAnd_1T/7   [using 1 threads]             41.7ns ± 0%             10.5ns ± 0%  -74.87%
BM_booleanAnd_1T/8   [using 1 threads]             52.1ns ± 0%             10.1ns ± 0%  -80.59%
BM_booleanAnd_1T/10  [using 1 threads]             76.3ns ± 0%             13.8ns ± 0%  -81.87%
BM_booleanAnd_1T/15  [using 1 threads]              167ns ± 0%               16ns ± 0%  -90.45%
BM_booleanAnd_1T/16  [using 1 threads]              188ns ± 0%               16ns ± 0%  -91.57%
BM_booleanAnd_1T/31  [using 1 threads]              667ns ± 0%               34ns ± 0%  -94.83%
BM_booleanAnd_1T/32  [using 1 threads]              710ns ± 0%               35ns ± 0%  -95.01%
BM_booleanAnd_1T/64  [using 1 threads]             2.80µs ± 0%             0.11µs ± 0%  -95.93%
BM_booleanAnd_1T/128 [using 1 threads]             11.2µs ± 0%              0.4µs ± 0%  -96.11%
BM_booleanAnd_1T/256 [using 1 threads]             44.6µs ± 0%              2.5µs ± 0%  -94.31%
BM_booleanAnd_1T/512 [using 1 threads]              178µs ± 0%               10µs ± 0%  -94.35%
BM_booleanAnd_1T/1k  [using 1 threads]              717µs ± 0%               78µs ± 1%  -89.07%
BM_booleanAnd_1T/2k  [using 1 threads]             2.87ms ± 0%             0.31ms ± 1%  -89.08%
BM_booleanAnd_1T/4k  [using 1 threads]             11.7ms ± 0%              1.9ms ± 4%  -83.55%
BM_booleanAnd_1T/10k [using 1 threads]             70.3ms ± 0%             17.2ms ± 4%  -75.48%
2020-04-20 20:16:28 +00:00
dlazenby
00f6340153 Update PreprocessorDirectives.dox - Added line for the new VectorwiseOp plugin directive (and re-alphabatized the plugin section) 2020-04-17 21:43:37 +00:00
Rasmus Munk Larsen
5ab87d8aba Move eigen_packet_wrapper to GenericPacketMath.h and use it for SSE/AVX/AVX512 as it is already used for NEON.
This will allow us to define multiple packet types backed by the same vector type, e.g., __m128i.
Use this machanism to define packets for half and clean up the packet op implementations.
2020-04-15 18:17:19 +00:00
Rasmus Munk Larsen
4aae8ac693 Fix typo in TypeCasting.h 2020-04-14 02:55:51 +00:00
Rasmus Munk Larsen
1d674003b2 Fix big in vectorized casting of
{uint8, int8} -> {int16, uint16, int32, uint32, float} 
 {uint16, int16} -> {int32, uint32, int64, uint64, float} 

for NEON. These conversions were advertised as vectorized, but not actually implemented.
2020-04-14 02:11:06 +00:00
Changming Sun
b1aa07a8d3 Fix a bug in TensorIndexList.h 2020-04-13 18:22:03 +00:00
Christoph Hertzberg
d46d726e9d CommaInitializer wrongfully asserted for 0-sized blocks
commainitialier unit-test never actually called `test_block_recursion`, which also was not correctly implemented and would have caused too deep template recursion.
2020-04-13 16:41:20 +02:00
Antonio Sanchez
c854e189e6 Fixed commainitializer test.
The removed `finished()` call was responsible for enforcing that the
initializer was provided the correct number of values. Putting it back in
to restore previous behavior.
2020-04-10 13:53:26 -07:00
jangsoopark
39142904cc Resolve C4346 when building eigen on windows 2020-04-08 14:55:39 +09:00
Rasmus Munk Larsen
f0577a2bfd Speed up matrix multiplication for small to medium size matrices by using half- or quarter-packet vectorized loads in gemm_pack_rhs if they have size 4, instead of dropping down the the scalar path.
Benchmark measurements below are for computing ```c.noalias() = a.transpose() * b;``` for square RowMajor matrices of varying size.

Measured improvement with AVX+FMA:

name                           old time/op             new time/op             delta
BM_MatMul_ATB/8                 139ns ± 1%              129ns ± 1%   -7.49%          (p=0.008 n=5+5)
BM_MatMul_ATB/32               1.46µs ± 1%             1.22µs ± 0%  -16.72%          (p=0.008 n=5+5)
BM_MatMul_ATB/64               8.43µs ± 1%             7.41µs ± 0%  -12.04%          (p=0.008 n=5+5)
BM_MatMul_ATB/128              56.8µs ± 1%             52.9µs ± 1%   -6.83%          (p=0.008 n=5+5)
BM_MatMul_ATB/256               407µs ± 1%              395µs ± 3%   -2.94%          (p=0.032 n=5+5)
BM_MatMul_ATB/512              3.27ms ± 3%             3.18ms ± 1%     ~             (p=0.056 n=5+5)


Measured improvement for AVX512:

name                          old time/op             new time/op             delta
BM_MatMul_ATB/8                167ns ± 1%              154ns ± 1%   -7.63%          (p=0.008 n=5+5)
BM_MatMul_ATB/32              1.08µs ± 1%             0.83µs ± 3%  -23.58%          (p=0.008 n=5+5)
BM_MatMul_ATB/64              6.21µs ± 1%             5.06µs ± 1%  -18.47%          (p=0.008 n=5+5)
BM_MatMul_ATB/128             36.1µs ± 2%             31.3µs ± 1%  -13.32%          (p=0.008 n=5+5)
BM_MatMul_ATB/256              263µs ± 2%              242µs ± 2%   -7.92%          (p=0.008 n=5+5)
BM_MatMul_ATB/512             1.95ms ± 2%             1.91ms ± 2%     ~             (p=0.095 n=5+5)
BM_MatMul_ATB/1k              15.4ms ± 4%             14.8ms ± 2%     ~             (p=0.095 n=5+5)
2020-04-07 22:09:51 +00:00
Antonio Sanchez
8e875719b3 Replace norm() with squaredNorm() to address integer overflows
For random matrices with integer coefficients, many of the tests here lead to
integer overflows. When taking the norm() of a row/column, the squaredNorm()
often overflows to a negative value, leading to domain errors when taking the
sqrt(). This leads to a crash on some systems. By replacing the norm() call by
a squaredNorm(), the values still overflow, but at least there is no domain
error.

Addresses https://gitlab.com/libeigen/eigen/-/issues/1856
2020-04-07 19:48:28 +00:00
Antonio Sanchez
9dda5eb7d2 Missing struct definition in NumTraits 2020-04-07 09:01:11 -07:00
Akshay Naresh Modi
bcc0e9e15c Add numeric_limits min and max for bool
This will allow (among other things) computation of argmax and argmin of bool tensors
2020-04-06 23:38:57 +00:00
Bernardo Bahia Monteiro
54a0a9c9dd Bugfix: conjugate_gradient did not compile with lazy-evaluated RealScalar
The error generated by the compiler was:

    no matching function for call to 'maxi'
    RealScalar threshold = numext::maxi(tol*tol*rhsNorm2,considerAsZero);

The important part in the following notes was:

    candidate template ignored: deduced conflicting
    types for parameter 'T'"
    ('codi::Multiply11<...>' vs. 'codi::ActiveReal<...>')
    EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)

I am using CoDiPack to provide the RealScalar type.
This bug was introduced in bc000deaa Fix conjugate-gradient for very small rhs
2020-03-29 19:44:12 -04:00
Rasmus Munk Larsen
4fd5d1477b Fix packetmath test build for AVX. 2020-03-27 17:05:39 +00:00
Rasmus Munk Larsen
393dbd8ee9 Fix bug in 52d54278be 2020-03-27 16:42:18 +00:00
Rasmus Munk Larsen
55c8fe8d0f Fix bug in 52d54278be 2020-03-27 16:41:15 +00:00
Joel Holdsworth
6d2dbfc453 NEON: Fixed MSVC types definitions 2020-03-26 20:19:58 +00:00
Joel Holdsworth
52d54278be Additional NEON packet-math operations 2020-03-26 20:18:19 +00:00
Everton Constantino
deb93ed1bf Adhere to recommended load/store intrinsics for pp64le 2020-03-23 15:18:15 -03:00
Aaron Franke
5c22c7a7de Make file formatting comply with POSIX and Unix standards
UTF-8, LF, no BOM, and newlines at the end of files
2020-03-23 18:09:02 +00:00
Everton Constantino
5afdaa473a Fixing float32's pround halfway criteria to match STL's criteria. 2020-03-21 22:30:54 -05:00
Alessio M
96cd1ff718 Fixed:
- access violation when initializing 0x0 matrices
- exception can be thrown during stack unwind while comma-initializing a matrix if eigen_assert if configured to throw
2020-03-21 05:11:21 +00:00
dlazenby
cc954777f2 Update VectorwiseOp.h to allow Plugins similar to MatrixBase.h or ArrayBase.h 2020-03-20 19:30:01 +00:00
Masaki Murooka
55ecd58a3c Bug https://gitlab.com/libeigen/eigen/-/issues/1415: add missing EIGEN_DEVICE_FUNC to diagonal_product_evaluator_base. 2020-03-20 13:37:37 +09:00
Rasmus Munk Larsen
4da2c6b197 Remove reference to non-existent unary_op_base class. 2020-03-19 18:23:06 +00:00
Rasmus Munk Larsen
eda90baf35 Add missing arguments to numext::absdiff(). 2020-03-19 18:16:55 +00:00
Joel Holdsworth
d5c665742b Add absolute_difference coefficient-wise binary Array function 2020-03-19 17:45:20 +00:00
Everton Constantino
6ff5a14091 Reenabling packetmath unsigned tests, adding dummy pabs for relevant unsigned
types.
2020-03-19 17:31:49 +00:00
Joel Holdsworth
232f904082 Add shift_left<N> and shift_right<N> coefficient-wise unary Array functions 2020-03-19 17:24:06 +00:00
Joel Holdsworth
54aa8fa186 Implement integer square-root for NEON 2020-03-19 17:05:13 +00:00
Allan Leal
37ccb86916 Update NullaryFunctors.h 2020-03-16 11:59:02 +00:00
Deven Desai
7158ed4e0e Fixing HIP breakage caused by the recent commit that introduces Packet4h2 as the Eigen::Half packet type 2020-03-12 01:06:24 +00:00
Joel Holdsworth
d53ae40f7b NEON: Added int64_t and uint64_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
4b9ecf2924 NEON: Added int8_t and uint8_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
ceaabd4e16 NEON: Added int16_t and uint16_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
d5d3cf9339 NEON: Added uint32_t packet math 2020-03-10 22:46:19 +00:00
Joel Holdsworth
eacf97f727 NEON: Implemented half-size vectors 2020-03-10 22:46:19 +00:00
Joel Holdsworth
5f411b729e NEON: Set packet_traits<double> flags 2020-03-10 22:46:19 +00:00
Joel Holdsworth
88337acae2 test/packetmath: Add tests for all integer types 2020-03-10 22:46:19 +00:00
Joel Holdsworth
9e68977578 test/packetmath: Made negate non-mandatory 2020-03-10 22:46:19 +00:00
Sami Kama
b733b8b680 remove duplicate pset1 for half and add some comments about why we need expose pmul/add/div/min/max on host 2020-03-10 20:28:43 +00:00
Ram-Z
a45d28256d Don't restrict CMAKE_BUILD_TYPE
This prevents projects that add Eigen using `add_subdirectory` from using their own custom CMAKE_BUILD_TYPE and have Eigen respect the same custom flags.
2020-02-28 20:46:53 +00:00
Cédric Hubert
98bfc5aaa8 Update MarketIO.h 2020-02-28 12:41:51 +00:00
Rasmus Munk Larsen
52a2fbbb00 Revert "avoid selecting half-packets when unnecessary"
This reverts commit 5ca10480b0
2020-02-25 01:07:43 +00:00
Rasmus Munk Larsen
235bcfe08d Revert "Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE"
This reverts commit 44df2109c8
2020-02-25 01:07:28 +00:00
Rasmus Munk Larsen
d7a42eade6 Revert "do not pick full-packet if it'd result in more operations"
This reverts commit e9cc0cd353
2020-02-25 01:07:15 +00:00
Rasmus Munk Larsen
6ac37768a9 Revert "add some static checks for packet-picking logic"
This reverts commit 7769600245
2020-02-25 01:07:04 +00:00
Rasmus Munk Larsen
87cfa4862f Revert "Disable test in test/vectorization_logic.cpp, which is currently failing with AVX."
This reverts commit b625adffd8
2020-02-25 01:04:56 +00:00
Rasmus Munk Larsen
b625adffd8 Disable test in test/vectorization_logic.cpp, which is currently failing with AVX. 2020-02-24 23:28:25 +00:00
Tobias Bosch
f0ce88cff7 Include <sstream> explicitly, and don't rely on the implicit include via <complex>.
This implicit dependency does no longer exist in a recent llbm release (sha 78be61871704).
2020-02-24 23:09:36 +00:00
Ilya Tokar
eb6cc29583 Avoid a division in NonBlockingThreadPool::Steal.
Looking at profiles we spend ~10-20% of Steal on simply computing
random % size. We can reduce random 32-bit int into [0, size) range with
a single multiplication and shift. This transformation is described in
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
2020-02-14 16:02:57 -05:00
Francesco Mazzoli
7769600245 add some static checks for packet-picking logic 2020-02-07 18:16:16 +01:00
Francesco Mazzoli
e9cc0cd353 do not pick full-packet if it'd result in more operations
See comment and
<https://gitlab.com/libeigen/eigen/merge_requests/46#note_270622952>.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
44df2109c8 Pick full packet unconditionally when EIGEN_UNALIGNED_VECTORIZE
See comment for details.
2020-02-07 18:16:16 +01:00
Francesco Mazzoli
5ca10480b0 avoid selecting half-packets when unnecessary
See
<https://stackoverflow.com/questions/59709148/ensuring-that-eigen-uses-avx-vectorization-for-a-certain-operation>
for an explanation of the problem this solves.

In short, for some reason, before this commit the half-packet is
selected when the array / matrix size is not a multiple of
`unpacket_traits<PacketType>::size`, where `PacketType` starts out
being the full Packet.

For example, for some data of 100 `float`s, `Packet4f` will be
selected rather than `Packet8f`, because 100 is not a multiple of 8,
the size of `Packet8f`.

This commit switches to selecting the half-packet if the size is
less than the packet size, which seems to make more sense.

As I stated in the SO post I'm not sure that I'm understanding the
issue correctly, but this fix resolves the issue in my program. Moreover,
`make check` passes, with the exception of line 614 and 616 in
`test/packetmath.cpp`, which however also fail on master on my machine:

    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i0, internal::pbessel_i0);
    ...
    CHECK_CWISE1_IF(PacketTraits::HasBessel, numext::bessel_i1, internal::pbessel_i1);
2020-02-07 18:16:16 +01:00
Eugene Zhulenev
f584bd9b30 Fail at compile time if default executor tries to use non-default device 2020-02-06 22:43:24 +00:00
Eugene Zhulenev
3fda850c46 Remove dead code from TensorReduction.h 2020-01-29 18:45:31 +00:00
Jeff Daily
b5df8cabd7 fix hip-clang compilation due to new HIP scalar accessor 2020-01-20 21:08:52 +00:00
Deven Desai
6d284bb1b7 Fix for HIP breakage - 200115. Adding a missing EIGEN_DEVICE_FUNC attr 2020-01-16 00:51:43 +00:00
Srinivas Vasudevan
f6c6de5d63 Ensure Igamma does not NaN or Inf for large values. 2020-01-14 21:32:48 +00:00
Rasmus Munk Larsen
6601abce86 Remove rogue include in TypeCasting.h. Meta.h is already included by the top-level header in Eigen/Core. 2020-01-14 21:03:53 +00:00
Eugene Zhulenev
b9362fb8f7 Convert StridedLinearBufferCopy::Kind to enum class 2020-01-13 11:43:24 -08:00
Everton Constantino
5a8b97b401 Switching unpacket_traits<Packet4i> to vectorizable=true. 2020-01-13 16:08:20 -03:00
Everton Constantino
42838c28b8 Adding correct cache sizes for PPC architecture. 2020-01-13 16:58:14 +00:00
Christoph Hertzberg
1d0c45122a Removing executable bit from file mode 2020-01-11 15:02:29 +01:00
Christoph Hertzberg
35219cea68 Bug #1790: Make areApprox check numext::isnan instead of bitwise equality (NaNs don't have to be bitwise equal). 2020-01-11 14:57:22 +01:00
Srinivas Vasudevan
2e099e8d8f Added special_packetmath test and tweaked bounds on tests.
Refactor shared packetmath code to header file.
(Squashed from PR !38)
2020-01-11 10:31:21 +00:00
Rasmus Munk Larsen
e1ecfc162d call Explicitly ::rint and ::rintf for targets without c++11. Without this, the Windows build breaks when trying to compile numext::rint<double>. 2020-01-10 21:14:08 +00:00
Joel Holdsworth
da5a7afed0 Improvements to the tidiness and completeness of the NEON implementation 2020-01-10 18:31:15 +00:00
Anuj Rawat
452371cead Fix for gcc build error when using Eigen headers with AVX512 2020-01-10 18:05:42 +00:00
mehdi-goli
601f89dfd0 Adding RInt vector support for SYCL. 2020-01-10 18:00:36 +00:00
Matthew Powelson
2ea5a715cf Properly initialize b vector in SplineFitting
InterpolateWithDerivative does not initialize the be vector correctly. This issue is discussed In stackoverflow question 48382939.
2020-01-09 21:29:04 +00:00
Rasmus Munk Larsen
9254974115 Don't add EIGEN_DEVICE_FUNC to random() since ::rand is not available in Cuda. 2020-01-09 21:23:09 +00:00
Rasmus Munk Larsen
a3ec89b5bd Add missing EIGEN_DEVICE_FUNC annotations in MathFunctions.h. 2020-01-09 21:06:34 +00:00
Christoph Hertzberg
8333e03590 Use data.data() instead of &data (since it is not obvious that Array is trivially copyable) 2020-01-09 11:38:19 +01:00
Rasmus Munk Larsen
e6fcee995b Don't use the rational approximation to the logistic function on GPUs as it appears to be slightly slower. 2020-01-09 00:04:26 +00:00
Rasmus Munk Larsen
4217a9f090 The upper limits for where to use the rational approximation to the logistic function were not set carefully enough in the original commit, and some arguments would cause the function to return values greater than 1. This change set the versions found by scanning all floating point numbers (using std::nextafterf()). 2020-01-08 22:21:37 +00:00
Christoph Hertzberg
9623c0c4b9 Fix formatting 2020-01-08 13:58:18 +01:00
Ilya Tokar
19876ced76 Bug #1785: Introduce numext::rint.
This provides a new op that matches std::rint and previous behavior of
pround. Also adds corresponding unsupported/../Tensor op.
Performance is the same as e. g. floor (tested SSE/AVX).
2020-01-07 21:22:44 +00:00
mehdi-goli
d0ae052da4 [SYCL Backend]
* Adding Missing operations for vector comparison in SYCL. This caused compiler error for vector comparison when compiling SYCL
 * Fixing the compiler error for placement new in TensorForcedEval.h This caused compiler error when compiling SYCL backend
 * Reducing the SYCL warning by  removing the abort function inside the kernel
 * Adding Strong inline to functions inside SYCL interop.
2020-01-07 15:13:37 +00:00
Everton Constantino
eedb7eeacf Protecting integer_types's long long test with a check to see if we have CXX11 support. 2020-01-07 14:35:35 +00:00
Christoph Hertzberg
bcbaad6d87 Bug #1800: Guard against misleading indentation 2020-01-03 13:47:43 +01:00
Janek Kozicki
00de570793 Fix -Werror -Wfloat-conversion warning. 2019-12-23 23:52:44 +01:00
Deven Desai
636e2bb3fa Fix for HIP breakage - 191220
The breakage was introduced by the following commit :

ae07801dd8

After the commit, HIPCC errors out on some tests with the following error

```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_device_1.dir/cxx11_tensor_device_1_generated_cxx11_tensor_device.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_device.cu:17:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor💯
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:129:12: error: no matching constructor for initialization of 'Eigen::internal::TensorBlockResourceRequirements'
    return {merge(lhs.shape_type, rhs.shape_type),           // shape_type
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 3 were provided
struct TensorBlockResourceRequirements {
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit copy constructor) not viable: requires 5 arguments, but 3 were provided
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h:75:8: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 3 were provided
...
...
```

The fix is to explicitly decalre the (implicitly called) constructor as a device func
2019-12-20 21:28:00 +00:00
Christoph Hertzberg
1e9664b147 Bug #1796: Make matrix squareroot usable for Map and Ref types 2019-12-20 18:10:22 +01:00
Christoph Hertzberg
d86544d654 Reduce code duplication and avoid confusing Doxygen 2019-12-19 19:48:39 +01:00
Christoph Hertzberg
dde279f57d Hide recursive meta templates from Doxygen 2019-12-19 19:47:23 +01:00
Christoph Hertzberg
c21771ac04 Use double-braces initialization (as everywhere else in the test-suite). 2019-12-19 19:20:48 +01:00
Christoph Hertzberg
a3273aeff8 Fix trivial shadow warning 2019-12-19 19:13:11 +01:00
Christoph Hertzberg
870e53c0f2 Bug #1788: Fix rule-of-three violations inside the stable modules.
This fixes deprecated-copy warnings when compiling with GCC>=9
Also protect some additional Base-constructors from getting called by user code code (#1587)
2019-12-19 17:30:11 +01:00
Christoph Hertzberg
6965f6de7f Fix unit-test which I broke in previous fix 2019-12-19 13:42:14 +01:00
Eugene Zhulenev
7a65219a2e Fix TensorPadding bug in squeezed reads from inner dimension 2019-12-19 05:43:57 +00:00
Eugene Zhulenev
73e55525e5 Return const data pointer from TensorRef evaluator.data() 2019-12-18 23:19:36 +00:00
Eugene Zhulenev
ae07801dd8 Tensor block evaluation cost model 2019-12-18 20:07:00 +00:00
Christoph Hertzberg
72166d0e6e Fix some maybe-unitialized warnings 2019-12-18 18:26:20 +01:00
Christoph Hertzberg
5a3eaf88ac Workaround class-memaccess warnings on newer GCC versions 2019-12-18 16:37:26 +01:00
Jeff Daily
de07c4d1c2 fix compilation due to new HIP scalar accessor 2019-12-17 20:27:30 +00:00
Eugene Zhulenev
788bef6ab5 Reduce block evaluation overhead for small tensor expressions 2019-12-17 19:06:14 +00:00
Rasmus Munk Larsen
7252163335 Add default definition for EIGEN_PREDICT_* 2019-12-16 22:31:59 +00:00
Rasmus Munk Larsen
a566074480 Improve accuracy of fast approximate tanh and the logistic functions in Eigen, such that they preserve relative accuracy to within a few ULPs where their function values tend to zero (around x=0 for tanh, and for large negative x for the logistic function).
This change re-instates the fast rational approximation of the logistic function for float32 in Eigen (removed in 66f07efeae), but uses the more accurate approximation 1/(1+exp(-1)) ~= exp(x) below -9. The exponential is only calculated on the vectorized path if at least one element in the SIMD input vector is less than -9.

This change also contains a few improvements to speed up the original float specialization of logistic:
  - Introduce EIGEN_PREDICT_{FALSE,TRUE} for __builtin_predict and use it to predict that the logistic-only path is most likely (~2-3% speedup for the common case).
  - Carefully set the upper clipping point to the smallest x where the approximation evaluates to exactly 1. This saves the explicit clamping of the output (~7% speedup).

The increased accuracy for tanh comes at a cost of 10-20% depending on instruction set.

The benchmarks below repeated calls

   u = v.logistic()  (u = v.tanh(), respectively)

where u and v are of type Eigen::ArrayXf, have length 8k, and v contains random numbers in [-1,1].

Benchmark numbers for logistic:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        4467           4468         155835  model_time: 4827
AVX
BM_eigen_logistic_float        2347           2347         299135  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1467           1467         476143  model_time: 2926
AVX512
BM_eigen_logistic_float         805            805         858696  model_time: 1463

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_logistic_float        2589           2590         270264  model_time: 4827
AVX
BM_eigen_logistic_float        1428           1428         489265  model_time: 2926
AVX+FMA
BM_eigen_logistic_float        1059           1059         662255  model_time: 2926
AVX512
BM_eigen_logistic_float         673            673        1000000  model_time: 1463

Benchmark numbers for tanh:

Before:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2391           2391         292624  model_time: 4242
AVX
BM_eigen_tanh_float        1256           1256         554662  model_time: 2633
AVX+FMA
BM_eigen_tanh_float         823            823         866267  model_time: 1609
AVX512
BM_eigen_tanh_float         443            443        1578999  model_time: 805

After:
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
SSE
BM_eigen_tanh_float        2588           2588         273531  model_time: 4242
AVX
BM_eigen_tanh_float        1536           1536         452321  model_time: 2633
AVX+FMA
BM_eigen_tanh_float        1007           1007         694681  model_time: 1609
AVX512
BM_eigen_tanh_float         471            471        1472178  model_time: 805
2019-12-16 21:33:42 +00:00
Christoph Hertzberg
8e5da71466 Resolve double-promotion warnings when compiling with clang.
`sin` was calling `sin(double)` instead of `std::sin(float)`
2019-12-13 22:46:40 +01:00
Christoph Hertzberg
9b7a2b43c2 Renamed .hgignore to .gitignore (removing hg-specific "syntax" line) 2019-12-13 19:40:57 +01:00
Ilya Tokar
06e99aaf40 Bug 1785: fix pround on x86 to use the same rounding mode as std::round.
This also adds pset1frombits helper to Packet[24]d.
Makes round ~45% slower for SSE: 1.65µs ± 1% before vs 2.45µs ± 2% after,
stil an order of magnitude faster than scalar version: 33.8µs ± 2%.
2019-12-12 17:38:53 -05:00
Rasmus Munk Larsen
73a8d572f5 Clamp tanh approximation outside [-c, c] where c is the smallest value where the approximation is exactly +/-1. Without FMA, c = 7.90531110763549805, with FMA c = 7.99881172180175781. 2019-12-12 19:34:25 +00:00
Srinivas Vasudevan
88062b7fed Fix implementation of complex expm1. Add tests that fail with previous implementation, but pass with the current one. 2019-12-12 01:56:54 +00:00
Eugene Zhulenev
381f8f3139 Initialize non-trivially constructible types when allocating a temp buffer. 2019-12-12 01:31:30 +00:00
Eugene Zhulenev
64272c7f40 Squeeze reads from two inner dimensions in TensorPadding 2019-12-11 16:54:51 -08:00
Eugene Zhulenev
963ba1015b Add back accidentally deleted default constructor to TensorExecutorTilingContext. 2019-12-11 18:47:55 +00:00
Joel Holdsworth
1b6e0395e6 Added io test 2019-12-11 18:22:57 +00:00
Joel Holdsworth
3c0ef9f394 IO: Fixed printing of char and unsigned char matrices 2019-12-11 18:22:57 +00:00
Joel Holdsworth
e87af0ed37 Added Eigen::numext typedefs for uint8_t, int8_t, uint16_t and int16_t 2019-12-11 18:22:57 +00:00
Gael Guennebaud
15b3bcfca0 Bug 1786: fix compilation with MSVC 2019-12-11 16:16:38 +01:00
Eugene Zhulenev
c9220c035f Remove block memory allocation required by removed block evaluation API 2019-12-10 17:15:55 -08:00
Eugene Zhulenev
1c879eb010 Remove V2 suffix from TensorBlock 2019-12-10 15:40:23 -08:00
Eugene Zhulenev
dbca11e880 Remove TensorBlock.h and old TensorBlock/BlockMapper 2019-12-10 14:31:44 -08:00
Deven Desai
c49f0d851a Fix for HIP breakage detected on 191210
The following commit introduces compile errors when running eigen with hipcc

2918f85ba9

hipcc errors out because it requies the device attribute on the methods within the TensorBlockV2ResourceRequirements struct instroduced by the commit above. The fix is to add the device attribute to those methods
2019-12-10 22:14:05 +00:00
Eugene Zhulenev
2918f85ba9 Do not use std::vector in getResourceRequirements 2019-12-09 16:19:55 -08:00
Artem Belevich
8056a05b54 Undo the block size change.
.z *is* used by the EigenContractionKernelInternal().
2019-12-09 11:10:29 -08:00
Eugene Zhulenev
dbb703d44e Add async evaluation support to TensorSelectOp 2019-12-09 18:36:13 +00:00
Janek Kozicki
11d6465326 fix AlignedVector3 inconsisent interface with other Vector classes, default constructor and operator- were missing. 2019-12-06 21:07:39 +01:00
Eugene Zhulenev
bb7ccac3af Add recursive work splitting to EvalShardedByInnerDimContext 2019-12-05 14:51:49 -08:00
Artem Belevich
25230d1862 Improve performance of contraction kernels
* Force-inline implementations. They pass around pointers to shared memory
  blocks. Without inlining compiler must operate via generic pointers.
  Inlining allows compiler to detect that we're operating on shared memory
  which allows generation of substantially faster code.

* Fixed a long-standing typo which resulted in launching 8x more kernels
  than we needed (.z dimension of the block is unused by the kernel).
2019-12-05 12:48:34 -08:00
Gael Guennebaud
08eeb648ea update hg to git hashes 2019-12-05 16:33:24 +01:00
Rasmus Munk Larsen
366cf005b0 Add missing initialization in cxx11_tensor_trace.cpp. 2019-12-04 23:56:37 +00:00
Gael Guennebaud
c488b8b32f Replace calls to "hg" by calls to "git" 2019-12-04 11:24:06 +01:00
Gael Guennebaud
8fbe0e4699 Update old links to bitbucket to point to gitlab.com 2019-12-04 10:57:07 +01:00
Gael Guennebaud
114a15c66a Added tag before-git-migration for changeset a7c7d329d8 2019-12-04 10:06:00 +01:00
Rasmus Larsen
a7c7d329d8 Merged in ezhulenev/eigen-01 (pull request PR-769)
Capture TensorMap by value inside tensor expression AST
2019-12-04 00:49:10 +00:00
Rasmus Larsen
cacf433975 Merged in anshuljl/eigen-2/Anshul-Jaiswal/update-configurevectorizationh-to-not-op-1573079916090 (pull request PR-754)
Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda.

Approved-by: Deven Desai <deven.desai.amd@gmail.com>
2019-12-04 00:45:42 +00:00
Eugene Zhulenev
8f4536e852 Capture TensorMap by value inside tensor expression AST 2019-12-03 16:39:05 -08:00
Rasmus Munk Larsen
4e696901f8 Remove __host__ annotation for device-only function. 2019-12-03 14:33:19 -08:00
Rasmus Munk Larsen
ead81559c8 Use EIGEN_DEVICE_FUNC macro instead of __device__. 2019-12-03 12:08:22 -08:00
Gael Guennebaud
6358599ecb Fix QuaternionBase::cast for quaternion map and wrapper. 2019-12-03 14:51:14 +01:00
Gael Guennebaud
7745f69013 bug #1776: fix vector-wise STL iterator's operator-> using a proxy as pointer type.
This changeset fixes also the value_type definition.
2019-12-03 14:40:15 +01:00
Rasmus Munk Larsen
66f07efeae Revert the specialization for scalar_logistic_op<float> introduced in:
77b447c24e


While providing a 50% speedup on Haswell+ processors, the large relative error outside [-18, 18] in this approximation causes problems, e.g., when computing gradients of activation functions like softplus in neural networks.
2019-12-02 17:00:58 -08:00
Rasmus Larsen
3b15373bb3 Merged in ezhulenev/eigen-02 (pull request PR-767)
Fix shadow warnings in AlignedBox and SparseBlock
2019-12-02 18:23:11 +00:00
Deven Desai
312c8e77ff Fix for the HIP build+test errors.
Recent changes have introduced the following build error when compiling with HIPCC

---------

unsupported/test/../../Eigen/src/Core/GenericPacketMath.h:254:58: error:  'ldexp':  no overloaded function has restriction specifiers that are compatible with the ambient context 'pldexp'

---------

The fix for the error is to pick the math function(s) from the global namespace (where they are declared as device functions in the HIP header files) when compiling with HIPCC.
2019-12-02 17:41:32 +00:00
Rasmus Larsen
956131d0e6 Merged in codeplaysoftware/eigen/SYCL-Backend (pull request PR-691)
SYCL Backend

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-11-28 16:19:25 +00:00
Mehdi Goli
00f32752f7 [SYCL] Rebasing the SYCL support branch on top of the Einge upstream master branch.
* Unifying all loadLocalTile from lhs and rhs to an extract_block function.
* Adding get_tensor operation which was missing in TensorContractionMapper.
* Adding the -D method missing from cmake for Disable_Skinny Contraction operation.
* Wrapping all the indices in TensorScanSycl into Scan parameter struct.
* Fixing typo in Device SYCL
* Unifying load to private register for tall/skinny no shared
* Unifying load to vector tile for tensor-vector/vector-tensor operation
* Removing all the LHS/RHS class for extracting data from global
* Removing Outputfunction from TensorContractionSkinnyNoshared.
* Combining the local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining the no-local memory version of tall/skinny and normal tensor contraction into one kernel.
* Combining General Tensor-Vector and VectorTensor contraction into one kernel.
* Making double buffering optional for Tensor contraction when local memory is version is used.
* Modifying benchmark to accept custom Reduction Sizes
* Disabling AVX optimization for SYCL backend on the host to allow SSE optimization to the host
* Adding Test for SYCL
* Modifying SYCL CMake
2019-11-28 10:08:54 +00:00
Eugene Zhulenev
82a47338df Fix shadow warnings in AlignedBox and SparseBlock 2019-11-27 16:22:27 -08:00
Rasmus Munk Larsen
ea51a9eace Add missing EIGEN_DEVICE_FUNC attribute to template specializations for pexp to fix GPU build. 2019-11-27 10:17:09 -08:00
Rasmus Munk Larsen
5a3ebda36b Fix warning due to missing cast for exponent arguments for std::frexp and std::lexp. 2019-11-26 16:18:29 -08:00
Rasmus Larsen
2df57be856 Merged in realjhol/eigen/fix-warnings (pull request PR-760)
Fix warnings
2019-11-26 23:24:23 +00:00
Eugene Zhulenev
5496d0da0b Add async evaluation support to TensorReverse 2019-11-26 15:02:24 -08:00
Eugene Zhulenev
bc66c88255 Add async evaluation support to TensorPadding/TensorImagePatch/TensorShuffling 2019-11-26 11:41:57 -08:00
Gael Guennebaud
c79b6ffe1f Add an explicit example for auto and re-evaluation 2019-11-20 17:31:23 +01:00
Hans Johnson
e78ed6e7f3 COMP: Simplify install commands for Eigen
Confirm that install directory is identical
before and after this simplifying patch.

```bash
hg clone <<Eigen>>
mkdir eigen-bld
cd eigen-bld
cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/bef
make install
find /tmp/pre_eigen_modernize >/tmp/bef

#  Apply this patch

cmake ../Eigen -DCMAKE_INSTALL_PREFIX:PATH=/tmp/aft
make install
find /tmp/post_eigen_modernize |sed 's/post_e/pre_e/g' >/tmp/aft
diff /tmp/bef /tmp/aft
```
2019-11-17 15:14:25 -06:00
Hans Johnson
9d5cdc98c3 COMP: target_compile_definitions requires cmake 2.8.11
Features committed in 2016 have required cmake verison 2.8.11.

`sergiu Tue Nov 22 12:25:06 2016 +0100:   target_compile_definitions`

Set the minimum cmake version to the minimum version that
is capable of compiling or installing the code base.
2019-11-17 14:59:32 -06:00
Gael Guennebaud
e5778b87b9 Fix duplicate symbol linking error. 2019-11-20 17:23:19 +01:00
Joel Holdsworth
86eb41f1cb SparseRef: Fixed alignment warning on ARM GCC 2019-11-07 14:34:06 +00:00
Anshul Jaiswal
c1a67cb5af Update ConfigureVectorization.h to not optimize fp16 routines when compiling with cuda. 2019-11-06 22:40:38 +00:00
Rasmus Munk Larsen
cc3d0e6a40 Add EIGEN_HAS_INTRINSIC_INT128 macro
Add a new EIGEN_HAS_INTRINSIC_INT128 macro, and use this instead of __SIZEOF_INT128__. This fixes related issues with TensorIntDiv.h when building with Clang for Windows, where support for 128-bit integer arithmetic is advertised but broken in practice.
2019-11-06 14:24:33 -08:00
Rasmus Munk Larsen
ee404667e2 Rollback or PR-746 and partial rollback of 668ab3fc47
.

std::array is still not supported in CUDA device code on Windows.
2019-11-05 17:17:58 -08:00
Joel Holdsworth
743c925286 test/packetmath: Silence alignment warnings 2019-11-05 19:06:12 +00:00
Rasmus Larsen
0c9745903a Merged in ezhulenev/eigen-01 (pull request PR-746)
Remove internal::smart_copy and replace with std::copy
2019-11-04 20:18:38 +00:00
Hans Johnson
8c8cab1afd STYLE: Convert CMake-language commands to lower case
Ancient CMake versions required upper-case commands.  Later command names
became case-insensitive.  Now the preferred style is lower-case.
2019-10-31 11:36:37 -05:00
Hans Johnson
6fb3e5f176 STYLE: Remove CMake-language block-end command arguments
Ancient versions of CMake required else(), endif(), and similar block
termination commands to have arguments matching the command starting the block.
This is no longer the preferred style.
2019-10-31 11:36:27 -05:00
Rasmus Munk Larsen
f1e8307308 1. Fix a bug in psqrt and make it return 0 for +inf arguments.
2. Simplify handling of special cases by taking advantage of the fact that the
   builtin vrsqrt approximation handles negative, zero and +inf arguments correctly.
   This speeds up the SSE and AVX implementations by ~20%.
3. Make the Newton-Raphson formula used for rsqrt more numerically robust:

Before: y = y * (1.5 - x/2 * y^2)
After: y = y * (1.5 - y * (x/2) * y)

Forming y^2 can overflow for very large or very small (denormalized) values of x, while x*y ~= 1. For AVX512, this makes it possible to compute accurate results for denormal inputs down to ~1e-42 in single precision.

4. Add a faster double precision implementation for Knights Landing using the vrsqrt28 instruction and a single Newton-Raphson iteration.

Benchmark results: https://bitbucket.org/snippets/rmlarsen/5LBq9o
2019-11-15 17:09:46 -08:00
Gael Guennebaud
2cb2915f90 bug #1744: fix compilation with MSVC 2017 and AVX512, plog1p/pexpm1 require plog/pexp, but the later was disabled on some compilers 2019-11-15 13:39:51 +01:00
Gael Guennebaud
c3f6fcf2c0 bug #1747: one more fix for MSVC regarding the Bessel implementation. 2019-11-15 11:12:35 +01:00
Gael Guennebaud
b9837ca9ae bug #1281: fix AutoDiffScalar's make_coherent for nested expression of constant ADs. 2019-11-14 14:58:08 +01:00
Gael Guennebaud
0fb6e24408 Fix case issue with Lapack unit tests 2019-11-14 14:16:05 +01:00
Gael Guennebaud
8af045a287 bug #1774: fix VectorwiseOp::begin()/end() return types regarding constness. 2019-11-14 11:45:52 +01:00
Sakshi Goynar
75b4c0a3e0 PR 751: Fixed compilation issue when compiling using MSVC with /arch:AVX512 flag 2019-10-31 16:09:16 -07:00
Gael Guennebaud
8496f86f84 Enable CompleteOrthogonalDecomposition::pseudoInverse with non-square fixed-size matrices. 2019-11-13 21:16:53 +01:00
Gael Guennebaud
002e5b6db6 Move to my.cdash.org 2019-11-13 13:33:49 +01:00
Eugene Zhulenev
13c3327f5c Remove legacy block evaluation support 2019-11-12 10:12:28 -08:00
Gael Guennebaud
71aa53dd6d Disable AVX on broken xcode versions. See PR 748.
Patch adapted from Hans Johnson's PR 748.
2019-11-12 11:40:38 +01:00
Rasmus Munk Larsen
0ed0338593 Fix a race in async tensor evaluation: Don't run on_done() until after device.deallocate() / evaluator.cleanup() complete, since the device might be destroyed after on_done() runs. 2019-11-11 12:26:41 -08:00
Eugene Zhulenev
c952b8dfda Break loop dependence in TensorGenerator block access 2019-11-11 10:32:57 -08:00
Rasmus Munk Larsen
ebf04fb3e8 Fix data race in css11_tensor_notification test. 2019-11-08 17:44:50 -08:00
Eugene Zhulenev
73ecb2c57d Cleanup includes in Tensor module after switch to C++11 and above 2019-10-29 15:49:54 -07:00
Eugene Zhulenev
e7ed4bd388 Remove internal::smart_copy and replace with std::copy 2019-10-29 11:25:24 -07:00
Eugene Zhulenev
fbc0a9a3ec Fix CXX11Meta compilation with MSVC 2019-10-28 18:30:10 -07:00
Eugene Zhulenev
bd864ab42b Prevent potential ODR in TensorExecutor 2019-10-28 15:45:09 -07:00
Mehdi Goli
6332aff0b2 This PR fixes:
* The specialization of array class in the different namespace for GCC<=6.4
* The implicit call to `std::array` constructor using the initializer list for GCC <=6.1
2019-10-23 15:56:56 +01:00
Rasmus Larsen
8e4e29ae99 Merged in deven-amd/eigen-hip-fix-191018 (pull request PR-738)
Fix for the HIP build+test errors.
2019-10-22 22:18:38 +00:00
Rasmus Munk Larsen
97c0c5d485 Add block evaluation V2 to TensorAsyncExecutor.
Add async evaluation to a number of ops.
2019-10-22 12:42:44 -07:00
Deven Desai
102cf2a72d Fix for the HIP build+test errors.
The errors were introduced by this commit :

After the above mentioned commit, some of the tests started failing with the following error


```
Built target cxx11_tensor_reduction
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:117:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:155:5: error: the field type is not amp-compatible
    DestinationBufferKind m_kind;
    ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/src/Tensor/TensorBlockV2.h:211:3: error: the field type is not amp-compatible
  DestinationBuffer m_destination;
  ^
```


For some reason HIPCC does not like device code to contain enum types which do not have the base-type explicitly declared. The fix is trivial, explicitly state "int" as the basetype
2019-10-22 19:21:27 +00:00
Rasmus Munk Larsen
668ab3fc47 Drop support for c++03 in Eigen tensor. Get rid of some code used to emulate c++11 functionality with older compilers. 2019-10-18 16:42:00 -07:00
Eugene Zhulenev
df0e8b8137 Propagate block evaluation preference through rvalue tensor expressions 2019-10-17 11:17:33 -07:00
Eugene Zhulenev
0d2a14ce11 Cleanup Tensor block destination and materialized block storage allocation 2019-10-16 17:14:37 -07:00
Eugene Zhulenev
02431cbe71 TensorBroadcasting support for random/uniform blocks 2019-10-16 13:26:28 -07:00
Eugene Zhulenev
d380c23b2c Block evaluation for TensorGenerator/TensorReverse/TensorShuffling 2019-10-14 14:31:59 -07:00
Gael Guennebaud
39fb9eeccf bug #1747: fix compilation with MSVC 2019-10-14 22:50:23 +02:00
Eugene Zhulenev
a411e9f344 Block evaluation for TensorGenerator + TensorReverse + fixed bug in tensor reverse op 2019-10-10 10:56:58 -07:00
Rasmus Larsen
b03eb63d7c Merged in ezhulenev/eigen-01 (pull request PR-726)
Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing
2019-10-10 16:58:11 +00:00
Gael Guennebaud
e7d8ba747c bug #1752: make is_convertible equivalent to the std c++11 equivalent and fallback to std::is_convertible when c++11 is enabled. 2019-10-10 17:41:47 +02:00
Gael Guennebaud
fb557aec5c bug #1752: disable some is_convertible tests for recent compilers. 2019-10-10 11:40:21 +02:00
Eugene Zhulenev
33e1746139 Block evaluation for TensorChipping + fixed bugs in TensorPadding and TensorSlicing 2019-10-09 12:45:31 -07:00
Gael Guennebaud
f0a4642bab Implement c++03 compatible fix for changeset 7a43af1a33 2019-10-09 16:00:57 +02:00
Gael Guennebaud
196de2efe3 Explicitly bypass resize and memmoves when there is already the exact right number of elements available. 2019-10-08 21:44:33 +02:00
Gael Guennebaud
36da231a41 Disable an expected warning in unit test 2019-10-08 16:28:14 +02:00
Gael Guennebaud
d1def335dc fix one more possible conflicts with real/imag 2019-10-08 16:19:10 +02:00
Gael Guennebaud
87427d2eaa PR 719: fix real/imag namespace conflict 2019-10-08 09:15:17 +02:00
Gael Guennebaud
7a43af1a33 Fix compilation of FFTW unit test 2019-10-08 08:58:35 +02:00
Eugene Zhulenev
f74ab8cb8d Add block evaluation to TensorEvalTo and fix few small bugs 2019-10-07 15:34:26 -07:00
Brian Zhao
3afb640b56 Fixing incorrect size in Tensor documentation. 2019-10-04 21:30:35 -07:00
Rasmus Munk Larsen
20c4a9118f Use "pdiv" rather than operator/ to support packet types. 2019-10-04 16:54:03 -07:00
Rasmus Larsen
d1dd51cb5f Merged in ezhulenev/eigen-01 (pull request PR-723)
Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-10-04 17:19:13 +00:00
Eugene Zhulenev
98bdd7252e Fix compilation warnings and errors with clang in TensorBlockV2 code and tests 2019-10-04 10:15:33 -07:00
Rasmus Munk Larsen
fab4e3a753 Address comments on Chebyshev evaluation code:
1. Use pmadd when possible.
2. Add casts to avoid c++03 warnings.
2019-10-02 12:48:17 -07:00
Eugene Zhulenev
60ae24ee1a Add block evaluation to TensorReshaping/TensorCasting/TensorPadding/TensorSelect 2019-10-02 12:44:06 -07:00
Eugene Zhulenev
6e40454a6e Add beta to TensorContractionKernel and make memset optional 2019-10-02 11:06:02 -07:00
Rasmus Munk Larsen
bd0fac456f Prevent infinite loop in the nvcc compiler while unrolling the recurrent templates for Chebyshev polynomial evaluation. 2019-10-01 13:15:30 -07:00
Gael Guennebaud
9549ba8313 Fix perf issue in SimplicialLDLT::solve for complexes (again, m_diag is real) 2019-10-01 12:54:25 +02:00
Gael Guennebaud
c8b2c603b0 Fix speed issue with SimplicialLDLT for complexes: the diagonal is real! 2019-09-30 16:14:34 +02:00
Rasmus Munk Larsen
13ef08e5ac Move implementation of vectorized error function erf() to SpecialFunctionsImpl.h. 2019-09-27 13:56:04 -07:00
Eugene Zhulenev
7c8bc0d928 Fix cxx11_tensor_block_io test 2019-09-25 11:48:11 -07:00
Eugene Zhulenev
0c845e28c9 Fix erf in c++03 2019-09-25 11:31:45 -07:00
Eugene Zhulenev
71d5bedf72 Fix compilation warnings and errors with clang in TensorBlockV2 2019-09-25 11:25:22 -07:00
Deven Desai
5e186b1987 Fix for the HIP build+test errors.
The errors were introduced by this commit : d38e6fbc27


After the above mentioned commit, some of the tests started failing with the following error


```
Building HIPCC object unsupported/test/CMakeFiles/cxx11_tensor_reduction_gpu_5.dir/cxx11_tensor_reduction_gpu_5_generated_cxx11_tensor_reduction_gpu.cu.o
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:70:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsHalf.h:28:22: error: call to 'erf' is ambiguous
  return Eigen::half(Eigen::numext::erf(static_cast<float>(a)));
                     ^~~~~~~~~~~~~~~~~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1600:7: note: candidate function [with T = float]
float erf(const float &x) { return ::erff(x); }
      ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = float]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:23: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                      ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
In file included from /home/rocm-user/eigen/unsupported/test/cxx11_tensor_reduction_gpu.cu:16:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/Tensor:29:
In file included from /home/rocm-user/eigen/unsupported/Eigen/CXX11/../SpecialFunctions:75:
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/arch/GPU/GpuSpecialFunctions.h:87:33: error: call to 'erf' is ambiguous
  return make_double2(erf(a.x), erf(a.y));
                                ^~~
/home/rocm-user/eigen/unsupported/test/../../Eigen/src/Core/MathFunctions.h:1603:8: note: candidate function [with T = double]
double erf(const double &x) { return ::erf(x); }
       ^
/home/rocm-user/eigen/unsupported/Eigen/CXX11/../src/SpecialFunctions/SpecialFunctionsImpl.h:1897:5: note: candidate function [with Scalar = double]
    erf(const Scalar& x) {
    ^
3 errors generated.
```


This PR fixes the compile error by removing the "old" implementation for "erf" (assuming that the "new" implementation is what we want going forward. from a GPU point-of-view both implementations are the same).

This PR also fixes what seems like a cut-n-paste error in the aforementioned commit
2019-09-25 15:39:13 +00:00
Eugene Zhulenev
f35b9ab510 Fix a bug in a packed block type in TensorContractionThreadPool 2019-09-24 16:54:36 -07:00
Rasmus Larsen
d38e6fbc27 Merged in rmlarsen/eigen (pull request PR-704)
Add generic PacketMath implementation of the Error Function (erf).
2019-09-24 23:40:29 +00:00
Rasmus Munk Larsen
591a554c68 Add TODO to cleanup FMA cost modelling. 2019-09-24 16:39:25 -07:00
Eugene Zhulenev
c64396b4c6 Choose TensorBlock StridedLinearCopy type statically 2019-09-24 16:04:29 -07:00
Eugene Zhulenev
c97b208468 Add new TensorBlock api implementation + tests 2019-09-24 15:17:35 -07:00
Eugene Zhulenev
ef9dfee7bd Tensor block evaluation V2 support for unary/binary/broadcsting 2019-09-24 12:52:45 -07:00
Christoph Hertzberg
efd9867ff0 bug #1746: Removed implementation of standard copy-constructor and standard copy-assign-operator from PermutationMatrix and Transpositions to allow malloc-less std::move. Added unit-test to rvalue_types 2019-09-24 11:09:58 +02:00
Christoph Hertzberg
e4c1b3c1d2 Fix implicit conversion warnings and use pnegate to negate packets 2019-09-23 16:07:43 +02:00
Christoph Hertzberg
ba0736fa8e Fix (or mask away) conversion warnings introduced in 553caeb6a3
.
2019-09-23 15:58:05 +02:00
Rasmus Munk Larsen
1d5af0693c Add support for asynchronous evaluation of tensor casting expressions. 2019-09-19 13:54:49 -07:00
Rasmus Munk Larsen
6de5ed08d8 Add generic PacketMath implementation of the Error Function (erf). 2019-09-19 12:48:30 -07:00
Rasmus Munk Larsen
28b6786498 Fix build on setups without AVX512DQ. 2019-09-19 12:36:09 -07:00
Deven Desai
e02d429637 Fix for the HIP build+test errors.
The errors were introduced by this commit : 6e215cf109


The fix is switching to using ::<math_func> instead std::<math_func> when compiling for GPU
2019-09-18 18:44:20 +00:00
Srinivas Vasudevan
df0816b71f Merging eigen/eigen. 2019-09-16 19:33:29 -04:00
Srinivas Vasudevan
6e215cf109 Add Bessel functions to SpecialFunctions.
- Split SpecialFunctions files in to a separate BesselFunctions file.

In particular add:
    - Modified bessel functions of the second kind k0, k1, k0e, k1e
    - Bessel functions of the first kind j0, j1
    - Bessel functions of the second kind y0, y1
2019-09-14 12:16:47 -04:00
Eugene Zhulenev
7c73296849 Revert accidental change to GCC diagnostics 2019-09-13 14:30:58 -07:00
Eugene Zhulenev
bf8866b466 Fix maybe-unitialized warnings in TensorContractionThreadPool 2019-09-13 14:29:55 -07:00
Eugene Zhulenev
553caeb6a3 Use ThreadLocal container in TensorContractionThreadPool 2019-09-13 12:14:44 -07:00
Srinivas Vasudevan
facdec5aa7 Add packetized versions of i0e and i1e special functions.
- In particular refactor the i0e and i1e code so scalar and vectorized path share code.
  - Move chebevl to GenericPacketMathFunctions.


A brief benchmark with building Eigen with FMA, AVX and AVX2 flags

Before:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            57.3           57.3     10000000
BM_eigen_i0e_double/8           398            398        1748554
BM_eigen_i0e_double/64         3184           3184         218961
BM_eigen_i0e_double/512       25579          25579          27330
BM_eigen_i0e_double/4k       205043         205042           3418
BM_eigen_i0e_double/32k     1646038        1646176            422
BM_eigen_i0e_double/256k   13180959       13182613             53
BM_eigen_i0e_double/1M     52684617       52706132             10
BM_eigen_i0e_float/1             28.4           28.4     24636711
BM_eigen_i0e_float/8             75.7           75.7      9207634
BM_eigen_i0e_float/64           512            512        1000000
BM_eigen_i0e_float/512         4194           4194         166359
BM_eigen_i0e_float/4k         32756          32761          21373
BM_eigen_i0e_float/32k       261133         261153           2678
BM_eigen_i0e_float/256k     2087938        2088231            333
BM_eigen_i0e_float/1M       8380409        8381234             84
BM_eigen_i1e_double/1            56.3           56.3     10000000
BM_eigen_i1e_double/8           397            397        1772376
BM_eigen_i1e_double/64         3114           3115         223881
BM_eigen_i1e_double/512       25358          25361          27761
BM_eigen_i1e_double/4k       203543         203593           3462
BM_eigen_i1e_double/32k     1613649        1613803            428
BM_eigen_i1e_double/256k   12910625       12910374             54
BM_eigen_i1e_double/1M     51723824       51723991             10
BM_eigen_i1e_float/1             28.3           28.3     24683049
BM_eigen_i1e_float/8             74.8           74.9      9366216
BM_eigen_i1e_float/64           505            505        1000000
BM_eigen_i1e_float/512         4068           4068         171690
BM_eigen_i1e_float/4k         31803          31806          21948
BM_eigen_i1e_float/32k       253637         253692           2763
BM_eigen_i1e_float/256k     2019711        2019918            346
BM_eigen_i1e_float/1M       8238681        8238713             86


After:

CPU: Intel Haswell with HyperThreading (6 cores)
Benchmark                  Time(ns)        CPU(ns)     Iterations
-----------------------------------------------------------------
BM_eigen_i0e_double/1            15.8           15.8     44097476
BM_eigen_i0e_double/8            99.3           99.3      7014884
BM_eigen_i0e_double/64          777            777         886612
BM_eigen_i0e_double/512        6180           6181         100000
BM_eigen_i0e_double/4k        48136          48140          14678
BM_eigen_i0e_double/32k      385936         385943           1801
BM_eigen_i0e_double/256k    3293324        3293551            228
BM_eigen_i0e_double/1M     12423600       12424458             57
BM_eigen_i0e_float/1             16.3           16.3     43038042
BM_eigen_i0e_float/8             30.1           30.1     23456931
BM_eigen_i0e_float/64           169            169        4132875
BM_eigen_i0e_float/512         1338           1339         516860
BM_eigen_i0e_float/4k         10191          10191          68513
BM_eigen_i0e_float/32k        81338          81337           8531
BM_eigen_i0e_float/256k      651807         651984           1000
BM_eigen_i0e_float/1M       2633821        2634187            268
BM_eigen_i1e_double/1            16.2           16.2     42352499
BM_eigen_i1e_double/8           110            110        6316524
BM_eigen_i1e_double/64          822            822         851065
BM_eigen_i1e_double/512        6480           6481         100000
BM_eigen_i1e_double/4k        51843          51843          10000
BM_eigen_i1e_double/32k      414854         414852           1680
BM_eigen_i1e_double/256k    3320001        3320568            212
BM_eigen_i1e_double/1M     13442795       13442391             53
BM_eigen_i1e_float/1             17.6           17.6     41025735
BM_eigen_i1e_float/8             35.5           35.5     19597891
BM_eigen_i1e_float/64           240            240        2924237
BM_eigen_i1e_float/512         1424           1424         485953
BM_eigen_i1e_float/4k         10722          10723          65162
BM_eigen_i1e_float/32k        86286          86297           8048
BM_eigen_i1e_float/256k      691821         691868           1000
BM_eigen_i1e_float/1M       2777336        2777747            256


This shows anywhere from a 50% to 75% improvement on these operations.

I've also benchmarked without any of these flags turned on, and got similar
performance to before (if not better).

Also tested packetmath.cpp + special_functions to ensure no regressions.
2019-09-11 18:34:02 -07:00
Srinivas Vasudevan
b052ec6992 Merged eigen/eigen into default 2019-09-11 18:01:54 -07:00
Deven Desai
cdb377d0cb Fix for the HIP build+test errors introduced by the ndtri support.
The fixes needed are
 * adding EIGEN_DEVICE_FUNC attribute to a couple of funcs (else HIPCC will error out when non-device funcs are called from global/device funcs)
 * switching to using ::<math_func> instead std::<math_func> (only for HIPCC) in cases where the std::<math_func> is not recognized as a device func by HIPCC
 * removing an errant "j" from a testcase (don't know how that made it in to begin with!)
2019-09-06 16:03:49 +00:00
Gael Guennebaud
747c6a51ca bug #1736: fix compilation issue with A(all,{1,2}).col(j) by implementing true compile-time "if" for block_evaluator<>::coeff(i)/coeffRef(i) 2019-09-11 15:40:07 +02:00
Gael Guennebaud
031f17117d bug #1741: fix self-adjoint*matrix, triangular*matrix, and triangular^1*matrix with a destination having a non-trivial inner-stride 2019-09-11 15:04:25 +02:00
Gael Guennebaud
459b2bcc08 Fix compilation of BLAS backend and frontend 2019-09-11 10:02:37 +02:00
Rasmus Larsen
97f1e1d89f Merged in ezhulenev/eigen-01 (pull request PR-698)
ThreadLocal container that does not rely on thread local storage

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-09-10 23:19:33 +00:00
Eugene Zhulenev
d918bd9a8b Update ThreadLocal to use separate Initialize/Release callables 2019-09-10 16:13:32 -07:00
Gael Guennebaud
afa8d13532 Fix some implicit literal to Scalar conversions in SparseCore 2019-09-11 00:03:07 +02:00
Gael Guennebaud
c06e6fd115 bug #1741: fix SelfAdjointView::rankUpdate and product to triangular part for destination with non-trivial inner stride 2019-09-10 23:29:52 +02:00
Gael Guennebaud
ea0d5dc956 bug #1741: fix C.noalias() = A*C; with C.innerStride()!=1 2019-09-10 16:25:24 +02:00
Eugene Zhulenev
e3dec4dcc1 ThreadLocal container that does not rely on thread local storage 2019-09-09 15:18:14 -07:00
Gael Guennebaud
17226100c5 Fix a circular dependency regarding pshift* functions and GenericPacketMathFunctions.
Another solution would have been to make pshift* fully generic template functions with
partial specialization which is always a mess in c++03.
2019-09-06 09:26:04 +02:00
Gael Guennebaud
55b63d4ea3 Fix compilation without vector engine available (e.g., x86 with SSE disabled):
-> ppolevl is required by ndtri even for the scalar path
2019-09-05 18:16:46 +02:00
Srinivas Vasudevan
a9cf823db7 Merged eigen/eigen 2019-09-04 23:50:52 -04:00
Gael Guennebaud
e6c183f8fd Fix doc issues regarding ndtri 2019-09-04 23:00:21 +02:00
Gael Guennebaud
5702a57926 Fix possible warning regarding strict equality comparisons 2019-09-04 22:57:04 +02:00
Srinivas Vasudevan
99036a3615 Merging from eigen/eigen. 2019-09-03 15:34:47 -04:00
Eugene Zhulenev
a8d264fa9c Add test for const TensorMap underlying data mutation 2019-09-03 11:38:39 -07:00
Eugene Zhulenev
f68f2bba09 TensorMap constness should not change underlying storage constness 2019-09-03 11:08:09 -07:00
Gael Guennebaud
8e7e3d9bc8 Makes Scalar/RealScalar typedefs public in Pardiso's wrappers (see PR 688) 2019-09-03 13:09:03 +02:00
Srinivas Vasudevan
e38dd48a27 PR 681: Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Eugene Zhulenev
f59bed7a13 Change typedefs from private to protected to fix MSVC compilation 2019-09-03 19:11:36 -07:00
Eugene Zhulenev
47fefa235f Allow move-only done callback in TensorAsyncDevice 2019-09-03 17:20:56 -07:00
Srinivas Vasudevan
18ceb3413d Add ndtri function, the inverse of the normal distribution function. 2019-08-12 19:26:29 -04:00
Rasmus Munk Larsen
d55d392e7b Fix bugs in log1p and expm1 where repeated using statements would clobber each other.
Add specializations for complex types since std::log1p and std::exp1m do not support complex.
2019-08-08 16:27:32 -07:00
Rasmus Munk Larsen
85928e5f47 Guard against repeated definition of EIGEN_MPL2_ONLY 2019-08-07 14:19:00 -07:00
Rasmus Munk Larsen
facc4e4536 Disable tests for contraction with output kernels when using libxsmm, which does not support this. 2019-08-07 14:11:15 -07:00
Rasmus Munk Larsen
eab7e52db2 [Eigen] Vectorize evaluation of coefficient-wise functions over tensor blocks if the strides are known to be 1. Provides up to 20-25% speedup of the TF cross entropy op with AVX.
A few benchmark numbers:

name                              old time/op             new time/op             delta
BM_Xent_16_10000_cpu              448µs ± 3%              389µs ± 2%  -13.21%
(p=0.008 n=5+5)
BM_Xent_32_10000_cpu              575µs ± 6%              454µs ± 3%  -21.00%          (p=0.008 n=5+5)
BM_Xent_64_10000_cpu              933µs ± 4%              712µs ± 1%  -23.71%          (p=0.008 n=5+5)
2019-08-07 12:57:42 -07:00
Rasmus Munk Larsen
0987126165 Clean up unnecessary namespace specifiers in TensorBlock.h. 2019-08-07 12:12:52 -07:00
Gael Guennebaud
0050644b23 Fix doc regarding alignment and c++17 2019-08-04 01:09:41 +02:00
Rasmus Munk Larsen
e2999d4c38 Fix performance regressions due to https://bitbucket.org/eigen/eigen/pull-requests/662.
The change caused the device struct to be copied for each expression evaluation, and caused, e.g., a 10% regression in the TensorFlow multinomial op on GPU:


Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     128173         231326           2922  1.610G items/s

VS

Benchmark                       Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------------
BM_Multinomial_gpu_1_100000_4     146683         246914           2719  1.509G items/s
2019-08-02 11:18:13 -07:00
Alberto Luaces
c694be1214 Fixed Tensor documentation formatting. 2019-07-23 09:24:06 +00:00
Gael Guennebaud
15f3d9d272 More colamd cleanup:
- Move colamd implementation in its own namespace to avoid polluting the internal namespace with Ok, Status, etc.
- Fix signed/unsigned warning
- move some ugly free functions as member functions
2019-09-03 00:50:51 +02:00
Anshul Jaiswal
a4d1a6cd7d Eigen_Colamd.h updated to replace constexpr with consts and enums. 2019-08-17 05:29:23 +00:00
Anshul Jaiswal
283558face Ordering.h edited to fix dependencies on Eigen_Colamd.h 2019-08-15 20:21:56 +00:00
Anshul Jaiswal
39f30923c2 Eigen_Colamd.h edited replacing macros with constexprs and functions. 2019-08-15 20:15:19 +00:00
Anshul Jaiswal
0a6b553ecf Eigen_Colamd.h edited online with Bitbucket replacing constant #defines with const definitions 2019-07-21 04:53:31 +00:00
Kyle Vedder
f22b7283a3 Added leading asterisk for Doxygen to consume as it was removing asterisk intended to be part of the code. 2019-07-18 18:12:14 +00:00
Michael Grupp
6e17491f45 Fix typo in Umeyama method documentation 2019-07-17 11:20:41 +00:00
Christoph Hertzberg
e0f5a2a456 Remove {} accidentally added in previous commit 2019-07-18 20:22:17 +02:00
Christoph Hertzberg
ea6d7eb32f Move variadic constructors outside #ifndef EIGEN_PARSED_BY_DOXYGEN block, to make it actually appear in the generated documentation. 2019-07-12 19:46:37 +02:00
Christoph Hertzberg
9237883ff1 Escape \# inside doxygen docu 2019-07-12 19:45:13 +02:00
Christoph Hertzberg
c2671e5315 Build deprecated snippets with -DEIGEN_NO_DEPRECATED_WARNING
Also, document LinSpaced only where it is implemented
2019-07-12 19:43:32 +02:00
Eugene Zhulenev
3cd148f983 Fix expression evaluation heuristic for TensorSliceOp 2019-07-09 12:10:26 -07:00
Rasmus Munk Larsen
23b958818e Fix compiler for unsigned integers. 2019-07-09 11:18:25 -07:00
Eugene Zhulenev
6083014594 Add outer/inner chipping optimization for chipping dimension specified at runtime 2019-07-03 11:35:25 -07:00
Deven Desai
7eb2e0a95b adding the EIGEN_DEVICE_FUNC attribute to the constCast routine.
Not having this attribute results in the following failures in the `--config=rocm` TF build.

```
In file included from tensorflow/core/kernels/cross_op_gpu.cu.cc:20:
In file included from ./tensorflow/core/framework/register_types.h:20:
In file included from ./tensorflow/core/framework/numeric_types.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:140:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
    typename Storage::Type result = constCast(m_impl.data());
                                    ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h:356:37: error:  'Eigen::constCast':  no overloaded function has restriction specifiers that are compatible with the ambient context 'data'
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:148:56: note: in instantiation of member function 'Eigen::TensorEvaluator<const Eigen::TensorChippingOp<1, Eigen::TensorMap<Eigen::Tensor<int, 2, 1, long>, 16, MakePointer> >, Eigen::Gpu\
Device>::data' requested here
    return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());

```

Adding the EIGEN_DEVICE_FUNC attribute resolves those errors
2019-07-02 20:02:46 +00:00
Gael Guennebaud
ef8aca6a89 Merged in codeplaysoftware/eigen (pull request PR-667)
[SYCL] :

Approved-by: Gael Guennebaud <g.gael@free.fr>
Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-07-02 12:45:23 +00:00
Eugene Zhulenev
4ac93f8edc Allocate non-const scalar buffer for block evaluation with DefaultDevice 2019-07-01 10:55:19 -07:00
Mehdi Goli
9ea490c82c [SYCL] :
* Modifying TensorDeviceSYCL to use `EIGEN_THROW_X`.
  * Modifying TensorMacro to use `EIGEN_TRY/CATCH(X)` macro.
  * Modifying TensorReverse.h to use `EIGEN_DEVICE_REF` instead of `&`.
  * Fixing the SYCL device macro in SpecialFunctionsImpl.h.
2019-07-01 16:27:28 +01:00
Eugene Zhulenev
81a03bec75 Fix TensorReverse on GPU with m_stride[i]==0 2019-06-28 15:50:39 -07:00
Rasmus Munk Larsen
8053eeb51e Fix CUDA compilation error for pselect<half>. 2019-06-28 12:07:29 -07:00
Rasmus Munk Larsen
74a9dd1102 Fix preprocessor condition to only generate a warning when calling eigen::GpuDevice::synchronize() from device code, but not when calling from a non-GPU compilation unit. 2019-06-28 11:56:21 -07:00
Rasmus Munk Larsen
70d4020ad9 Remove comma causing warning in c++03 mode. 2019-06-28 11:39:45 -07:00
Eugene Zhulenev
6e7c76481a Merge with Eigen head 2019-06-28 11:22:46 -07:00
Eugene Zhulenev
878845cb25 Add block access to TensorReverseOp and make sure that TensorForcedEval uses block access when preferred 2019-06-28 11:13:44 -07:00
Rasmus Munk Larsen
1f61aee5ca [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:11:56 -07:00
Mehdi Goli
7d08fa805a [SYCL] This PR adds the minimum modifications to the Eigen unsupported module required to run it on devices supporting SYCL.
* Abstracting the pointer type so that both SYCL memory and pointer can be captured.
* Converting SYCL virtual pointer to SYCL device memory in Eigen evaluator class.
* Binding SYCL placeholder accessor to command group handler by using bind method in Eigen evaluator node.
* Adding SYCL macro for controlling loop unrolling.
* Modifying the TensorDeviceSycl.h and SYCL executor method to adopt the above changes.
2019-06-28 10:08:23 +01:00
Mehdi Goli
16a56b2ddd [SYCL] This PR adds the minimum modifications to Eigen core required to run Eigen unsupported modules on devices supporting SYCL.
* Adding SYCL memory model
* Enabling/Disabling SYCL  backend in Core
*  Supporting Vectorization
2019-06-27 12:25:09 +01:00
Christoph Hertzberg
adec097c61 Remove extra comma (causes warnings in C++03) 2019-06-26 16:14:28 +02:00
Eugene Zhulenev
229db81572 Optimize evaluation strategy for TensorSlicingOp and TensorChippingOp 2019-06-25 15:41:37 -07:00
Deven Desai
ba506d5bd2 fix for a ROCm/HIP specificcompile errror introduced by a recent commit. 2019-06-22 00:06:05 +00:00
Rasmus Munk Larsen
c9394d7a0e Remove extra "one" in comment. 2019-06-20 16:23:19 -07:00
Rasmus Munk Larsen
b8f8dac4eb Update comment as suggested by tra@google.com. 2019-06-20 16:18:37 -07:00
Rasmus Munk Larsen
e5e63c2cad Fix grammar. 2019-06-20 16:03:59 -07:00
Rasmus Munk Larsen
302a404b7e Added comment explaining the surprising EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC clause. 2019-06-20 15:59:08 -07:00
Rasmus Munk Larsen
b5237f53b1 Fix CUDA build on Mac. 2019-06-20 15:44:14 -07:00
Rasmus Munk Larsen
988f24b730 Various fixes for packet ops.
1. Fix buggy pcmp_eq and unit test for half types.
2. Add unit test for pselect and add specializations for SSE 4.1, AVX512, and half types.
3. Get rid of FIXME: Implement faster pnegate for half by XOR'ing with a sign bit mask.
2019-06-20 11:47:49 -07:00
Christoph Hertzberg
e0be7f30e1 bug #1724: Mask buggy warnings with g++-7
(grafted from 427f2f66d6
)
2019-06-14 14:57:46 +02:00
Anshul Jaiswal
fab51d133e Updated Eigen_Colamd.h, namespacing macros ALIVE & DEAD as COLAMD_ALIVE & COLAMD_DEAD
to prevent conflicts with other libraries / code.
2019-06-08 21:09:06 +00:00
Eugene Zhulenev
79c402e40e Fix shadow warnings in TensorContractionThreadPool 2019-08-30 15:38:31 -07:00
Eugene Zhulenev
edf2ec28d8 Fix block mapper type name in TensorExecutor 2019-08-30 15:29:25 -07:00
Eugene Zhulenev
f0b36fb9a4 evalSubExprsIfNeededAsync + async TensorContractionThreadPool 2019-08-30 15:13:38 -07:00
Eugene Zhulenev
619cea9491 Revert accidentally removed <memory> header from ThreadPool 2019-08-30 14:51:17 -07:00
Eugene Zhulenev
66665e7e76 Asynchronous expression evaluation with TensorAsyncDevice 2019-08-30 14:49:40 -07:00
Rasmus Munk Larsen
f6c51d9209 Fix missing header inclusion and colliding definitions for half type casting, which broke
build with -march=native on Haswell/Skylake.
2019-08-30 14:03:29 -07:00
Eugene Zhulenev
bc40d4522c Const correctness in TensorMap<const Tensor<T, ...>> expressions 2019-08-28 17:46:05 -07:00
Rasmus Munk Larsen
1187bb65ad Add more tests for corner cases of log1p and expm1. Add handling of infinite arguments to log1p such that log1p(inf) = inf. 2019-08-28 12:20:21 -07:00
Eugene Zhulenev
6e77f9bef3 Remove shadow warnings in TensorDeviceThreadPool 2019-08-28 10:32:19 -07:00
Rasmus Munk Larsen
9aba527405 Revert changes to std_falback::log1p that broke handling of arguments less than -1. Fix packet op accordingly. 2019-08-27 15:35:29 -07:00
Rasmus Munk Larsen
b021cdea6d Clean up float16 a.k.a. Eigen::half support in Eigen. Move the definition of half to Core/arch/Default and move arch-specific packet ops to their respective sub-directories. 2019-08-27 11:30:31 -07:00
Rasmus Larsen
84fefdf321 Merged in ezhulenev/eigen-01 (pull request PR-683)
Asynchronous parallelFor in Eigen ThreadPoolDevice
2019-08-26 21:49:17 +00:00
maratek
8b5ab0e4dd Fix get_random_seed on Native Client
Newlib in Native Client SDK does not provide ::random function.
Implement get_random_seed for NaCl using ::rand, similarly to Windows version.
2019-08-23 15:25:56 -07:00
Eugene Zhulenev
6901788013 Asynchronous parallelFor in Eigen ThreadPoolDevice 2019-08-22 10:50:51 -07:00
Christoph Hertzberg
2fb24384c9 Merged in jaopaulolc/eigen (pull request PR-679)
Fixes for Altivec/VSX and compilation with clang on PowerPC
2019-08-22 15:57:33 +00:00
Rasmus Larsen
57f6b62597 Merged in rmlarsen/eigen (pull request PR-680)
Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
2019-08-22 00:25:29 +00:00
Eugene Zhulenev
071311821e Remove XSMM support from Tensor module 2019-08-19 11:44:25 -07:00
João P. L. de Carvalho
5ac7984ffa Fix debug macros in p{load,store}u 2019-08-14 11:59:12 -06:00
João P. L. de Carvalho
db9147ae40 Add missing pcmp_XX methods for double/Packet2d
This actually fixes an issue in unit-test packetmath_2 with pcmp_eq when it is compiled with clang. When pcmp_eq(Packet4f,Packet4f) is used instead of pcmp_eq(Packet2d,Packet2d), the unit-test does not pass due to NaN on ref vector.
2019-08-14 10:37:39 -06:00
Rasmus Munk Larsen
a3298b22ec Implement vectorized versions of log1p and expm1 in Eigen using Kahan's formulas, and change the scalar implementations to properly handle infinite arguments.
Depending on instruction set, significant speedups are observed for the vectorized path:
log1p wall time is reduced 60-93% (2.5x - 15x speedup)
expm1 wall time is reduced 0-85% (1x - 7x speedup)

The scalar path is slower by 20-30% due to the extra branch needed to handle +infinity correctly.

Full benchmarks measured on Intel(R) Xeon(R) Gold 6154 here: https://bitbucket.org/snippets/rmlarsen/MXBkpM
2019-08-12 13:53:28 -07:00
João P. L. de Carvalho
787f6ef025 Fix packed load/store for PowerPC's VSX
The vec_vsx_ld/vec_vsx_st builtins were wrongly used for aligned load/store. In fact, they perform unaligned memory access and, even when the address is 16-byte aligned, they are much slower (at least 2x) than their aligned counterparts.

For double/Packet2d vec_xl/vec_xst should be prefered over vec_ld/vec_st, although the latter works when casted to float/Packet4f.

Silencing some weird warning with throw but some GCC versions. Such warning are not thrown by Clang.
2019-08-09 16:02:55 -06:00
João P. L. de Carvalho
4d29aa0294 Fix offset argument of ploadu/pstoreu for Altivec
If no offset is given, them it should be zero.

Also passes full address to vec_vsx_ld/st builtins.

Removes userless _EIGEN_ALIGNED_PTR & _EIGEN_MASK_ALIGNMENT.

Removes unnecessary casts.
2019-08-09 15:59:26 -06:00
João P. L. de Carvalho
66d073c38e bug #1718: Add cast to successfully compile with clang on PowerPC
Ignoring -Wc11-extensions warnings thrown by clang at Altivec/PacketMath.h
2019-08-09 15:56:26 -06:00
Rasmus Munk Larsen
6d432eae5d Make is_valid_index_type return false for float and double when EIGEN_HAS_TYPE_TRAITS is off. 2019-06-05 16:42:27 -07:00
Rasmus Munk Larsen
f715f6e816 Add workaround for choosing the right include files with FP16C support with clang. 2019-06-05 13:36:37 -07:00
Justin Carpentier
ffaf658ecd PR 655: Fix missing Eigen namespace in Macros 2019-06-05 09:51:59 +02:00
Mehdi Goli
0b24e1cb5c [SYCL] Adding the SYCL memory model. The SYCL memory model provides :
* an interface for SYCL buffers to behave as a non-dereferenceable pointer
  * an interface for placeholder accessor to behave like a pointer on both host and device
2019-07-01 16:02:30 +01:00
Rasmus Larsen
c1b0aea653 Merged in Artem-B/eigen (pull request PR-654)
Minor build improvements

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-31 22:27:04 +00:00
Rasmus Munk Larsen
b08527b0c1 Clean up CUDA/NVCC version macros and their use in Eigen, and a few other CUDA build failures. 2019-05-31 15:26:06 -07:00
tra
b4c49bf00e Minor build improvements
* Allow specifying multiple GPU architectures. E.g.:
  cmake -DEIGEN_CUDA_COMPUTE_ARCH="60;70"
* Pass CUDA SDK path to clang. Without it it will default to /usr/local/cuda
which may not be the right location, if cmake was invoked with
-DCUDA_TOOLKIT_ROOT_DIR=/some/other/CUDA/path
2019-05-31 14:08:34 -07:00
Christoph Hertzberg
5614400581 digits10() needs to return an integer
Problem reported on https://stackoverflow.com/questions/56395899
2019-05-31 15:45:41 +02:00
Rasmus Larsen
36e0a2b93f Merged in deven-amd/eigen-hip-fix-190524 (pull request PR-649)
fix for HIP build errors that were introduced by a commit earlier this week
2019-05-24 16:05:31 +00:00
Deven Desai
2c38930161 fix for HIP build errors that were introduced by a commit earlier this week 2019-05-24 14:25:32 +00:00
Gustavo Lima Chaves
56bc4974fb GEMV: remove double declaration of constant.
That was hurting users with compilers that would object to proceed with
that:

"""
./Eigen/src/Core/products/GeneralMatrixVector.h:356:10: error: declaration shadows a static data member of 'general_matrix_vector_product<type-parameter-0-0, type-parameter-0-1, type-parameter-0-2, 1, ConjugateLhs, type-parameter-0-4, type-parameter-0-5, ConjugateRhs, Version>' [-Werror,-Wshadow]
         LhsPacketSize = Traits::LhsPacketSize,
         ^
./Eigen/src/Core/products/GeneralMatrixVector.h:307:22: note: previous declaration is here
  static const Index LhsPacketSize = Traits::LhsPacketSize;
"""
2019-05-23 14:50:29 -07:00
Christoph Hertzberg
ac21a08c13 Cast Index to RealScalar
This fixes compilation issues with RealScalar types that are not implicitly castable from Index (e.g. ceres Jet types).
Reported by Peter Anderson-Sprecher via eMail
2019-05-23 15:31:12 +02:00
Rasmus Munk Larsen
3eb5ad0ed0 Enable support for F16C with Clang. The required intrinsics were added here: https://reviews.llvm.org/D16177
and are part of LLVM 3.8.0.
2019-05-20 17:19:20 -07:00
Rasmus Larsen
e92486b8c3 Merged in rmlarsen/eigen (pull request PR-643)
Make Eigen build with cuda 10 and clang.

Approved-by: Justin Lebar <justin.lebar@gmail.com>
2019-05-20 17:02:39 +00:00
Rasmus Munk Larsen
fd595d42a7 Merge 2019-05-20 09:39:11 -07:00
Gael Guennebaud
cc7ecbb124 Merged in scramsby/eigen (pull request PR-646)
Eigen: Fix MSVC C++17 language standard detection logic
2019-05-20 07:19:10 +00:00
Eugene Zhulenev
01654d97fa Prevent potential division by zero in TensorExecutor 2019-05-17 14:02:25 -07:00
Rasmus Larsen
78d3015722 Merged in ezhulenev/eigen-01 (pull request PR-644)
Always evaluate Tensor expressions with broadcasting via tiled evaluation code path
2019-05-17 19:44:25 +00:00
Rasmus Larsen
bf9cbed8d0 Merged in glchaves/eigen (pull request PR-635)
Speed up GEMV on AVX-512 builds, just as done for GEBP previously.

Approved-by: Rasmus Larsen <rmlarsen@google.com>
2019-05-17 19:40:50 +00:00
Eugene Zhulenev
96a276803c Always evaluate Tensor expressions with broadcasting via tiled evaluation code path 2019-05-16 16:15:45 -07:00
Rasmus Munk Larsen
ab0a30e429 Make Eigen build with cuda 10 and clang. 2019-05-15 13:32:15 -07:00
Rasmus Munk Larsen
734a50dc60 Make Eigen build with cuda 10 and clang. 2019-05-15 13:32:15 -07:00
Rasmus Larsen
c8d8d5c0fc Merged in rmlarsen/eigen_threadpool (pull request PR-640)
Fix deadlocks in thread pool.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-05-13 20:04:35 +00:00
Christoph Hertzberg
5f32b79edc Collapsed revision from PR-641
* SparseLU.h - corrected example, it didn't compile
* Changed encoding back to UTF8
2019-05-13 19:02:30 +02:00
Anuj Rawat
ad372084f5 Removing unused API to fix compile error in TensorFlow due to
AVX512VL, AVX512BW usage
2019-05-12 14:43:10 +00:00
Christoph Hertzberg
4ccd1ece92 bug #1707: Fix deprecation warnings, or disable warnings when testing deprecated functions 2019-05-10 14:57:05 +02:00
Rasmus Munk Larsen
d3ef7cf03e Fix build with clang on Windows. 2019-05-09 11:07:04 -07:00
Rasmus Munk Larsen
e5ac8cbd7a A) fix deadlocks in thread pool caused by EventCount
This fixed 2 deadlocks caused by sloppiness in the EventCount logic.
Both most likely were introduced by cl/236729920 which includes the new EventCount algorithm:
01da8caf00

bug #1 (Prewait):
Prewait must not consume existing signals.
Consider the following scenario.
There are 2 thread pool threads (1 and 2) and 1 external thread (3). RunQueue is empty.
Thread 1 checks the queue, calls Prewait, checks RunQueue again and now is going to call CommitWait.
Thread 2 checks the queue and now is going to call Prewait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Now thread 2 resumes and calls Prewait and takes away the signal.
Thread 1 resumes and calls CommitWait, there are no pending signals anymore, so it blocks.
As the result we have 2 tasks, but only 1 thread is running.

bug #2 (CancelWait):
CancelWait must not take away a signal if it's not sure that the signal was meant for this thread.
When one thread blocks and another submits a new task concurrently, the EventCount protocol guarantees only the following properties (similar to the Dekker's algorithm):
(a) the registered waiter notices presence of the new task and does not block
(b) the signaler notices presence of the waiters and wakes it
(c) both the waiter notices presence of the new task and signaler notices presence of the waiter
[it's only that both of them do not notice each other must not be possible, because it would lead to a deadlock]
CancelWait is called for cases (a) and (c). For case (c) it is OK to take the notification signal away, but it's not OK for (a) because nobody queued a signals for us and we take away a signal meant for somebody else.
Consider:
Thread 1 calls Prewait, checks RunQueue, it's empty, now it's going to call CommitWait.
Thread 3 submits 2 tasks, EventCount signals is set to 1 because only 1 waiter is registered the second signal is discarded).
Thread 2 calls Prewait, checks RunQueue, discovers the tasks, calls CancelWait and consumes the pending signal (meant for thread 1).
Now Thread 1 resumes and calls CommitWait, since there are no signals it blocks.
As the result we have 2 tasks, but only 1 thread is running.

Both deadlocks are only a problem if the tasks require parallelism. Most computational tasks do not require parallelism, i.e. a single thread will run task 1, finish it and then dequeue and run task 2.

This fix undoes some of the sloppiness in the EventCount that was meant to reduce CPU consumption by idle threads, because we now have more threads running in these corner cases. But we still don't have pthread_yield's and maybe the strictness introduced by this change will actually help to reduce tail latency because we will have threads running when we actually need them running.



B) fix deadlock in thread pool caused by RunQueue

This fixed a deadlock caused by sloppiness in the RunQueue logic.
Most likely this was introduced with the non-blocking thread pool.
The deadlock only affects workloads that require parallelism.
Most computational tasks don't require parallelism.

PopBack must not fail spuriously. If it does, it can effectively lead to single thread consuming several wake up signals.
Consider 2 worker threads are blocked.
External thread submits a task. One of the threads is woken.
It tries to steal the task, but fails due to a spurious failure in PopBack (external thread submits another task and holds the lock).
The thread executes blocking protocol again (it won't block because NonEmptyQueueIndex is precise and the thread will discover pending work, but it has called PrepareWait).
Now external thread submits another task and signals EventCount again.
The signal is consumed by the first thread again. But now we have 2 tasks pending but only 1 worker thread running.

It may be possible to fix this in a different way: make EventCount::CancelWait forward wakeup signal to a blocked thread rather then consuming it. But this looks more complex and I am not 100% that it will fix the bug.
It's also possible to have 2 versions of PopBack: one will do try_to_lock and another won't. Then worker threads could first opportunistically check all queues with try_to_lock, and only use the blocking version before blocking. But let's first fix the bug with the simpler change.
2019-05-08 10:16:46 -07:00
Michael Tesch
c5019f722b Use pade for matrix exponential also for complex values. 2019-05-08 17:04:55 +02:00
Eugene Zhulenev
45b40d91ca Fix AVX512 & GCC 6.3 compilation 2019-05-07 16:44:55 -07:00
Christoph Hertzberg
e6667a7060 Fix stupid shadow-warnings (with old clang versions) 2019-05-07 18:32:19 +02:00
Christoph Hertzberg
e54dc24d62 Restore C++03 compatibility 2019-05-07 18:30:44 +02:00
Christoph Hertzberg
cca76c272c Restore C++03 compatibility 2019-05-06 16:18:22 +02:00
Rasmus Munk Larsen
8e33844fc7 Fix traits for scalar_logistic_op. 2019-05-03 15:49:09 -07:00
Scott Ramsby
ff06ef7584 Eigen: Fix MSVC C++17 language standard detection logic
To detect C++17 support, use _MSVC_LANG macro instead of _MSC_VER. _MSC_VER can indicate whether the current compiler version could support the C++17 language standard, but not whether that standard is actually selected (i.e. via /std:c++17).
See these web pages for more details:
https://devblogs.microsoft.com/cppblog/msvc-now-correctly-reports-__cplusplus/
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros
2019-05-03 14:14:09 -07:00
Eugene Zhulenev
e9f0eb8a5e Add masked_store_available to unpacket_traits 2019-05-02 14:52:58 -07:00
Eugene Zhulenev
96e30e936a Add masked pstoreu for Packet16h 2019-05-02 14:11:01 -07:00
Eugene Zhulenev
b4010f02f9 Add masked pstoreu to AVX and AVX512 PacketMath 2019-05-02 13:14:18 -07:00
Gael Guennebaud
578407f42f Fix regression in changeset ae33e866c7 2019-05-02 15:45:21 +02:00
Rasmus Larsen
ac50afaffa Merged in ezhulenev/eigen-01 (pull request PR-633)
Check if gpu_assert was overridden in TensorGpuHipCudaDefines
2019-04-29 16:29:35 +00:00
Gustavo Lima Chaves
d4dcb71bcb Speed up GEMV on AVX-512 builds, just as done for GEBP previously.
We take advantage of smaller SIMD registers as well, in that case.

Gains up to 3x for select input sizes.
2019-04-26 14:12:39 -07:00
Andy May
ae33e866c7 Fix compilation with PGI version 19 2019-04-25 21:23:19 +01:00
Gael Guennebaud
665ac22cc6 Merged in ezhulenev/eigen-01 (pull request PR-632)
Fix doxygen warnings
2019-04-25 20:02:20 +00:00
Eugene Zhulenev
01d7e6ee9b Check if gpu_assert was overridden in TensorGpuHipCudaDefines 2019-04-25 11:19:17 -07:00
Eugene Zhulenev
8ead5bb3d8 Fix doxygen warnings to enable statis code analysis 2019-04-24 12:42:28 -07:00
Eugene Zhulenev
07355d47c6 Get rid of SequentialLinSpacedReturnType deprecation warnings in DenseBase.h 2019-04-24 11:01:35 -07:00
Rasmus Munk Larsen
144ca33321 Remove deprecation annotation from typedef Eigen::Index Index, as it would generate too many build warnings. 2019-04-24 08:50:07 -07:00
Eugene Zhulenev
a7b7f3ca8a Add missing EIGEN_DEPRECATED annotations to deprecated functions and fix few other doxygen warnings 2019-04-23 17:23:19 -07:00
Eugene Zhulenev
68a2a8c445 Use packet ops instead of AVX2 intrinsics 2019-04-23 11:41:02 -07:00
Anuj Rawat
8c7a6feb8e Adding lowlevel APIs for optimized RHS packet load in TensorFlow
SpatialConvolution

Low-level APIs are added in order to optimized packet load in gemm_pack_rhs
in TensorFlow SpatialConvolution. The optimization is for scenario when a
packet is split across 2 adjacent columns. In this case we read it as two
'partial' packets and then merge these into 1. Currently this only works for
Packet16f (AVX512) and Packet8f (AVX2). We plan to add this for other
packet types (such as Packet8d) also.

This optimization shows significant speedup in SpatialConvolution with
certain parameters. Some examples are below.

Benchmark parameters are specified as:
Batch size, Input dim, Depth, Num of filters, Filter dim

Speedup numbers are specified for number of threads 1, 2, 4, 8, 16.

AVX512:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 |2.18X, 2.13X, 1.73X, 1.64X, 1.66X
128,   24x24,  1, 64,   8x8 |2.00X, 1.98X, 1.93X, 1.91X, 1.91X
 32,   24x24,  3, 64,   5x5 |2.26X, 2.14X, 2.17X, 2.22X, 2.33X
128,   24x24,  3, 64,   3x3 |1.51X, 1.45X, 1.45X, 1.67X, 1.57X
 32,   14x14, 24, 64,   5x5 |1.21X, 1.19X, 1.16X, 1.70X, 1.17X
128, 128x128,  3, 96, 11x11 |2.17X, 2.18X, 2.19X, 2.20X, 2.18X

AVX2:

Parameters                  | Speedup (Num of threads: 1, 2, 4, 8, 16)
----------------------------|------------------------------------------
128,   24x24,  3, 64,   5x5 | 1.66X, 1.65X, 1.61X, 1.56X, 1.49X
 32,   24x24,  3, 64,   5x5 | 1.71X, 1.63X, 1.77X, 1.58X, 1.68X
128,   24x24,  1, 64,   5x5 | 1.44X, 1.40X, 1.38X, 1.37X, 1.33X
128,   24x24,  3, 64,   3x3 | 1.68X, 1.63X, 1.58X, 1.56X, 1.62X
128, 128x128,  3, 96, 11x11 | 1.36X, 1.36X, 1.37X, 1.37X, 1.37X

In the higher level benchmark cifar10, we observe a runtime improvement
of around 6% for AVX512 on Intel Skylake server (8 cores).

On lower level PackRhs micro-benchmarks specified in TensorFlow
tensorflow/core/kernels/eigen_spatial_convolutions_test.cc, we observe
the following runtime numbers:

AVX512:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  |  41350                     | 15073                   | 2.74X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |   7277                     |  7341                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |   8675                     |  8681                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  |  24155                     | 16079                   | 1.50X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  |  25052                     | 17152                   | 1.46X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) |  18269                     | 18345                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) |  19468                     | 19872                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 156060                     | 42432                   | 3.68X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 132701                     | 36944                   | 3.59X

AVX2:

Parameters                                                     | Runtime without patch (ns) | Runtime with patch (ns) | Speedup
---------------------------------------------------------------|----------------------------|-------------------------|---------
BM_RHS_NAME(PackRhs, 128, 24, 24, 3, 64, 5, 5, 1, 1, 256, 56)  | 26233                      | 12393                   | 2.12X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 1, 1, 256, 56)  |  6091                      |  6062                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 32, 64, 5, 5, 2, 2, 256, 56)  |  7427                      |  7408                   | 1.00X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 1, 1, 256, 56)  | 23453                      | 20826                   | 1.13X
BM_RHS_NAME(PackRhs, 32, 64, 64, 30, 64, 5, 5, 2, 2, 256, 56)  | 23167                      | 22091                   | 1.09X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 1, 1, 256, 56) | 23422                      | 23682                   | 0.99X
BM_RHS_NAME(PackRhs, 32, 256, 256, 4, 16, 8, 8, 2, 4, 256, 56) | 23165                      | 23663                   | 0.98X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 1, 1, 36, 432)   | 72689                      | 44969                   | 1.62X
BM_RHS_NAME(PackRhs, 32, 64, 64, 4, 16, 3, 3, 2, 2, 36, 432)   | 61732                      | 39779                   | 1.55X

All benchmarks on Intel Skylake server with 8 cores.
2019-04-20 06:46:43 +00:00
Christoph Hertzberg
4270c62812 Split the implementation of i?amax/min into two. Based on PR-627 by Sameer Agarwal.
Like the Netlib reference implementation, I*AMAX now uses the L1-norm instead of the L2-norm for each element. Changed I*MIN accordingly.
2019-04-15 17:18:03 +02:00
Rasmus Munk Larsen
039ee52125 Tweak cost model for tensor contraction when parallelizing over the inner dimension.
https://bitbucket.org/snippets/rmlarsen/MexxLo
2019-04-12 13:35:10 -07:00
Jonathon Koyle
9a3f06d836 Update TheadPoolDevice example to include ThreadPool creation and passing pointer into constructor. 2019-04-10 10:02:33 -06:00
Deven Desai
66a885b61e adding EIGEN_DEVICE_FUNC to the recently added TensorContractionKernel constructor. Not having the EIGEN_DEVICE_FUNC attribute on it was leading to compiler errors when compiling Eigen in the ROCm/HIP path 2019-04-08 13:45:08 +00:00
Eugene Zhulenev
629ddebd15 Add missing semicolon 2019-04-02 15:04:26 -07:00
Eugene Zhulenev
4e2f6de1a8 Add support for custom packed Lhs/Rhs blocks in tensor contractions 2019-04-01 11:47:31 -07:00
Gael Guennebaud
45e65fbb77 bug #1695: fix a numerical robustness issue. Computing the secular equation at the middle range without a shift might give a wrong sign. 2019-03-27 20:16:58 +01:00
William D. Irons
8de66719f9 Collapsed revision from PR-619
* Add support for pcmp_eq in AltiVec/Complex.h
* Fixed implementation of pcmp_eq for double

The new logic is based on the logic from NEON for double.
2019-03-26 18:14:49 +00:00
Gael Guennebaud
f11364290e ICC does not support -fno-unsafe-math-optimizations 2019-03-22 09:26:24 +01:00
David Tellenbach
3031d57200 PR 621: Fix documentation of EIGEN_COMP_EMSCRIPTEN 2019-03-21 02:21:04 +01:00
Deven Desai
51e399fc15 updates requested in the PR feedback. Also droping coded within #ifdef EIGEN_HAS_OLD_HIP_FP16 2019-03-19 21:45:25 +00:00
Deven Desai
2dbea5510f Merged eigen/eigen into default 2019-03-19 16:52:38 -04:00
Rasmus Larsen
5c93b38c5f Merged in rmlarsen/eigen (pull request PR-618)
Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_op<float>.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-03-18 15:51:55 +00:00
Gael Guennebaud
48898a988a fix unit test in c++03: c++03 does not allow passing local or anonymous enum as template param 2019-03-18 11:38:36 +01:00
Gael Guennebaud
cf7e2e277f bug #1692: enable enum as sizes of Matrix and Array 2019-03-17 21:59:30 +01:00
Rasmus Munk Larsen
e42f9aa68a Make clipping outside [-18:18] consistent for vectorized and non-vectorized paths of scalar_logistic_<float>. 2019-03-15 17:15:14 -07:00
Rasmus Larsen
1936aac43f Merged in tellenbach/eigen/sykline_consistent_include_guards (pull request PR-617)
Fix include guard comments for Skyline module
2019-03-15 20:04:56 +00:00
David Tellenbach
bd9c2ae3fd Fix include guard comments 2019-03-15 15:29:17 +01:00
Rasmus Munk Larsen
8450a6d519 Clean up half packet traits and add a few more missing packet ops. 2019-03-14 15:18:06 -07:00
David Tellenbach
b013176e52 Remove undefined std::complex<int> 2019-03-14 11:40:28 +01:00
David Tellenbach
97f9a46cb9 PR 593: Add variadtic ctor for DiagonalMatrix with unit tests 2019-03-14 10:18:24 +01:00
Gael Guennebaud
45ab514fe2 revert debug stuff 2019-03-14 10:08:12 +01:00
Rasmus Munk Larsen
6a34003141 Remove EIGEN_MPL2_ONLY guard in IncompleteCholesky that is no longer needed after the AMD reordering code was relicensed to MPL2. 2019-03-13 11:52:41 -07:00
Gael Guennebaud
d7d2f0680e bug #1684: partially workaround clang's 6/7 bug #40815 2019-03-13 10:40:01 +01:00
Rasmus Larsen
690f0795d0 Merged in rmlarsen/eigen (pull request PR-615)
Clean up PacketMathHalf.h and add a few missing logical packet ops.
2019-03-12 16:09:48 +00:00
Thomas Capricelli
1901433674 erm.. use proper id 2019-03-12 13:53:38 +01:00
Thomas Capricelli
90302aa8c9 update tracking code 2019-03-12 13:47:01 +01:00
Rasmus Munk Larsen
77f7d4a894 Clean up PacketMathHalf.h and add a few missing logical packet ops. 2019-03-11 17:51:16 -07:00
Eugene Zhulenev
001f10e3c9 Fix segfaults with cuda compilation 2019-03-11 09:43:33 -07:00
Eugene Zhulenev
899c16fa2c Fix a bug in TensorGenerator for 1d tensors 2019-03-11 09:42:01 -07:00
Eugene Zhulenev
0f8bfff23d Fix a data race in NonBlockingThreadPool 2019-03-11 09:38:44 -07:00
Gael Guennebaud
656d9bc66b Apply SSE's pmin/pmax fix for GCC <= 5 to AVX's pmin/pmax 2019-03-10 21:19:18 +01:00
Gael Guennebaud
2df4f00246 Change license from LGPL to MPL2 with agreement from David Harmon. 2019-03-07 18:17:10 +01:00
Rasmus Munk Larsen
3c3f639fe2 Merge. 2019-03-06 11:54:30 -08:00
Rasmus Munk Larsen
f4ec8edea8 Add macro EIGEN_AVOID_THREAD_LOCAL to make it possible to manually disable the use of thread_local. 2019-03-06 11:52:04 -08:00
Rasmus Munk Larsen
41cdc370d0 Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:42:22 -08:00
Rasmus Munk Larsen
cc407c9d4d Fix placement of "#if defined(EIGEN_GPUCC)" guard region.
Found with -Wundefined-func-template.

Author: tkoeppe@google.com
2019-03-06 11:40:06 -08:00
Eugene Zhulenev
1bc2a0a57c Add missing return to NonBlockingThreadPool::LocalSteal 2019-03-06 10:49:49 -08:00
Eugene Zhulenev
4e4dcd9026 Remove redundant steal loop 2019-03-06 10:39:07 -08:00
Rasmus Larsen
4d808e834a Merged in rmlarsen/eigen_threadpool (pull request PR-606)
Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239


Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2019-03-06 17:59:03 +00:00
Rasmus Larsen
2ea18e505f Merged in ezhulenev/eigen-01 (pull request PR-610)
Block evaluation for TensorGeneratorOp
2019-03-06 16:49:38 +00:00
Eugene Zhulenev
25abaa2e41 Check that inner block dimension is continuous 2019-03-05 17:34:35 -08:00
Eugene Zhulenev
5d9a6686ed Block evaluation for TensorGeneratorOp 2019-03-05 16:35:21 -08:00
Rasmus Larsen
b4861f4778 Merged in ezhulenev/eigen-01 (pull request PR-609)
Tune tensor contraction threadpool heuristics
2019-03-05 23:54:40 +00:00
Gael Guennebaud
bfbf7da047 bug #1689 fix used-but-marked-unused warning 2019-03-05 23:46:24 +01:00
Eugene Zhulenev
a407e022e6 Tune tensor contraction threadpool heuristics 2019-03-05 14:19:59 -08:00
Eugene Zhulenev
56c6373f82 Add an extra check for the RunQueue size estimate 2019-03-05 11:51:26 -08:00
Eugene Zhulenev
b1a8627493 Do not create Tensor<const T> in cxx11_tensor_forced_eval test 2019-03-05 11:19:25 -08:00
Rasmus Munk Larsen
0318fc7f44 Remove EIGEN_MPL2_ONLY guards around code re-licensed from LGPL to MPL2 in 2ca1e73239 2019-03-05 10:24:54 -08:00
Eugene Zhulenev
efb5080d31 Do not initialize invalid fast_strides in TensorGeneratorOp 2019-03-04 16:58:49 -08:00
Eugene Zhulenev
b95941e5c2 Add tiled evaluation for TensorForcedEvalOp 2019-03-04 16:02:22 -08:00
Eugene Zhulenev
694084ecbd Use fast divisors in TensorGeneratorOp 2019-03-04 11:10:21 -08:00
Gael Guennebaud
b0d406d91c Enable construction of Ref<VectorType> from a runtime vector. 2019-03-03 15:25:25 +01:00
Sam Hasinoff
9ba81cf0ff Fully qualify Eigen::internal::aligned_free
This helps avoids a conflict on certain Windows toolchains
(potentially due to some ADL name resolution bug) in the case
where aligned_free is defined in the global namespace. In any
case, tightening this up is harmless.
2019-03-02 17:42:16 +00:00
Gael Guennebaud
22144e949d bug #1629: fix compilation of PardisoSupport (regression introduced in changeset a7842daef2
)
2019-03-02 22:44:47 +01:00
Bernhard M. Wiedemann
b071672e78 Do not keep latex logs
to make package builds more reproducible.
See https://reproducible-builds.org/ for why this is good.
2019-02-27 11:09:00 +01:00
Rasmus Munk Larsen
cf4a1c81fa Fix specialization for conjugate on non-complex types in TensorBase.h. 2019-03-01 14:21:09 -08:00
Sameer Agarwal
c181dfb8ab Consistently use EIGEN_BLAS_FUNC in BLAS.
Previously, for a few functions, eithe BLASFUNC or, EIGEN_CAT
was being used. This change uses EIGEN_BLAS_FUNC consistently
everywhere.

Also introduce EIGEN_BLAS_FUNC_SUFFIX, which by default is
equal to "_", this allows the user to inject a new suffix as
needed.
2019-02-27 11:30:58 -08:00
Rasmus Larsen
9558f4c25f Merged in rmlarsen/eigen_threadpool (pull request PR-596)
Improve EventCount used by the non-blocking threadpool.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-26 20:37:26 +00:00
Rasmus Larsen
2ca1e73239 Merged in rmlarsen/eigen (pull request PR-597)
Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2.

Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-25 17:02:16 +00:00
Gael Guennebaud
e409dbba14 Enable SSE vectorization of Quaternion and cross3() with AVX 2019-02-23 10:45:40 +01:00
Rasmus Munk Larsen
6560692c67 Improve EventCount used by the non-blocking threadpool.
The current algorithm requires threads to commit/cancel waiting in order
they called Prewait. Spinning caused by that serialization can consume
lots of CPU time on some workloads. Restructure the algorithm to not
require that serialization and remove spin waits from Commit/CancelWait.
Note: this reduces max number of threads from 2^16 to 2^14 to leave
more space for ABA counter (which is now 22 bits).
Implementation details are explained in comments.
2019-02-22 13:56:26 -08:00
Gael Guennebaud
0b25a5c431 fix alignment in ploadquad 2019-02-22 21:39:36 +01:00
Rasmus Munk Larsen
1dc1677d52 Change licensing of OrderingMethods/Amd.h and SparseCholesky/SimplicialCholesky_impl.h from LGPL to MPL2. Google LLC executed a license agreement with the author of the code from which these files are derived to allow the Eigen project to distribute the code and derived works under MPL2. 2019-02-22 12:33:57 -08:00
Gael Guennebaud
0cb4ba98e7 update wrt recent changes 2019-02-21 17:19:36 +01:00
Gael Guennebaud
cca6c207f4 AVX512: implement faster ploadquad<Packet16f> thus speeding up GEMM 2019-02-21 17:18:28 +01:00
Gael Guennebaud
1c09ee8541 bug #1674: workaround clang fast-math aggressive optimizations 2019-02-22 15:48:53 +01:00
Gael Guennebaud
7e3084bb6f Fix compilation on ARM. 2019-02-22 14:56:12 +01:00
Gael Guennebaud
32502f3c45 bug #1684: add simplified regression test for respective clang's bug (this also reveal the same bug in Apples's clang) 2019-02-22 10:29:06 +01:00
Gael Guennebaud
42c23f14ac Speed up col/row-wise reverse for fixed size matrices by propagating compile-time sizes. 2019-02-21 22:44:40 +01:00
Rasmus Munk Larsen
4d7f317102 Add a few missing packet ops: cmp_eq for NEON. pfloor for GPU. 2019-02-21 13:32:13 -08:00
Gael Guennebaud
2a39659d79 Add fully generic Vector<Type,Size> and RowVector<Type,Size> type aliases. 2019-02-20 15:23:23 +01:00
Gael Guennebaud
302377110a Update documentation of Matrix and Array type aliases. 2019-02-20 15:18:48 +01:00
Gael Guennebaud
475295b5ff Enable documentation of Array's typedefs 2019-02-20 15:18:07 +01:00
Gael Guennebaud
44b54fa4a3 Protect c++11 type alias with Eigen's macro, and add respective unit test. 2019-02-20 14:43:05 +01:00
Gael Guennebaud
7195f008ce Merged in ra_bauke/eigen (pull request PR-180)
alias template for matrix and array classes, see also bug #864

Approved-by: Heiko Bauke <heiko.bauke@mail.de>
2019-02-20 13:22:39 +00:00
Gael Guennebaud
4e8047cdcf Fix compilation with gcc and remove TR1 stuff. 2019-02-20 13:59:34 +01:00
Gael Guennebaud
844e5447f8 Update documentation regarding alignment issue. 2019-02-20 13:54:04 +01:00
Gael Guennebaud
edd413c184 bug #1409: make EIGEN_MAKE_ALIGNED_OPERATOR_NEW* macros empty in c++17 mode:
- this helps clang 5 and 6 to support alignas in STL's containers.
 - this makes the public API of our (and users) classes cleaner
2019-02-20 13:52:11 +01:00
Gael Guennebaud
3b5deeb546 bug #899: make sparseqr unit test more stable by 1) trying with larger threshold and 2) relax rank computation for rank-deficient problems. 2019-02-19 22:57:51 +01:00
Gael Guennebaud
482c5fb321 bug #899: remove "rank-revealing" qualifier for SparseQR and warn that it is not always rank-revealing. 2019-02-19 22:52:15 +01:00
Gael Guennebaud
9ac1634fdf Fix conversion warnings 2019-02-19 21:59:53 +01:00
Gael Guennebaud
292d61970a Fix C++17 compilation 2019-02-19 21:59:41 +01:00
Rasmus Munk Larsen
071629a440 Fix incorrect value of NumDimensions in TensorContraction traits.
Reported here: #1671
2019-02-19 10:49:54 -08:00
Christoph Hertzberg
a1646fc960 Commas at the end of enumerator lists are not allowed in C++03 2019-02-19 14:32:25 +01:00
Gael Guennebaud
2cfc025bda fix unit compilation in c++17: std::ptr_fun has been removed. 2019-02-19 14:05:22 +01:00
Gael Guennebaud
ab78cabd39 Add C++17 detection macro, and make sure throw(xpr) is not used if the compiler is in c++17 mode. 2019-02-19 14:04:35 +01:00
Gael Guennebaud
115da6a1ea Fix conversion warnings 2019-02-19 14:00:15 +01:00
Gael Guennebaud
7d10c78738 bug #1046: add unit tests for correct propagation of alignment through std::alignment_of 2019-02-19 10:31:56 +01:00
Gael Guennebaud
7580112c31 Fix harmless Scalar vs RealScalar cast. 2019-02-18 22:12:28 +01:00
Gael Guennebaud
e23bf40dc2 Add unit test for LinSpaced and complex numbers. 2019-02-18 22:03:47 +01:00
Gael Guennebaud
796db94e6e bug #1194: implement slightly faster and SIMD friendly 4x4 determinant. 2019-02-18 16:21:27 +01:00
Gael Guennebaud
31b6e080a9 Fix regression: .conjugate() was popped out but not re-introduced. 2019-02-18 14:45:55 +01:00
Gael Guennebaud
c69d0d08d0 Set cost of conjugate to 0 (in practice it boils down to a no-op).
This is also important to make sure that A.conjugate() * B.conjugate() does not evaluate
its arguments into temporaries (e.g., if A and B are fixed and small, or * fall back to lazyProduct)
2019-02-18 14:43:07 +01:00
Gael Guennebaud
512b74aaa1 GEMM: catch all scalar-multiple variants when falling-back to a coeff-based product.
Before only s*A*B was caught which was both inconsistent with GEMM, sub-optimal,
and could even lead to compilation-errors (https://stackoverflow.com/questions/54738495).
2019-02-18 11:47:54 +01:00
Christoph Hertzberg
ec032ac03b Guard C++11-style default constructor. Also, this is only needed for MSVC 2019-02-16 09:44:05 +01:00
Gael Guennebaud
902a7793f7 Add possibility to bench row-major lhs and rhs 2019-02-15 16:52:34 +01:00
Gael Guennebaud
83309068b4 bug #1680: improve MSVC inlining by declaring many triavial constructors and accessors as STRONG_INLINE. 2019-02-15 16:35:35 +01:00
Gael Guennebaud
0505248f25 bug #1680: make all "block" methods strong-inline and device-functions (some were missing EIGEN_DEVICE_FUNC) 2019-02-15 16:33:56 +01:00
Gael Guennebaud
559320745e bug #1678: Fix lack of __FMA__ macro on MSVC with AVX512 2019-02-15 10:30:28 +01:00
Gael Guennebaud
d85ae650bf bug #1678: workaround MSVC compilation issues with AVX512 2019-02-15 10:24:17 +01:00
Gael Guennebaud
f2970819a2 bug #1679: avoid possible division by 0 in complex-schur 2019-02-15 09:39:25 +01:00
Rasmus Munk Larsen
65e23ca7e9 Revert b55b5c7280
.
2019-02-14 13:46:13 -08:00
Rasmus Larsen
efeabee445 Merged in ezhulenev/eigen-01 (pull request PR-590)
Do not generate no-op cast() and conjugate() expressions
2019-02-14 21:16:12 +00:00
Eugene Zhulenev
7b837559a7 Fix signed-unsigned return in RuqQueue 2019-02-14 10:40:21 -08:00
Eugene Zhulenev
f0d42d2265 Fix signed-unsigned comparison warning in RunQueue 2019-02-14 10:27:28 -08:00
Eugene Zhulenev
106ba7bb1a Do not generate no-op cast() and conjugate() expressions 2019-02-14 09:51:51 -08:00
Eugene Zhulenev
8c2f30c790 Speedup Tensor ThreadPool RunQueu::Empty() 2019-02-13 10:20:53 -08:00
Gael Guennebaud
bdcb5f3304 Let's properly use Score instead of std::abs, and remove deprecated FIXME ( a /= b does a/b and not a * (1/b) as it was a long time ago...) 2019-02-11 22:56:19 +01:00
Gael Guennebaud
2edfc6807d Fix compilation of empty products of the form: Mx0 * 0xN 2019-02-11 18:24:07 +01:00
Gael Guennebaud
eb46f34a8c Speed up 2x2 LU by a factor 2, and other small fixed sizes by about 10%.
Not sure that's so critical, but this does not complexify the code base much.
2019-02-11 17:59:35 +01:00
Gael Guennebaud
dada863d23 Enable unit tests of PartialPivLU on fixed size matrices, and increase tested matrix size (blocking was not tested!) 2019-02-11 17:56:20 +01:00
Gael Guennebaud
ab6e6edc32 Speedup PartialPivLU for small matrices by passing compile-time sizes when available.
This change set also makes a better use of Map<>+OuterStride and Ref<> yielding surprising speed up for small dynamic sizes as well.
The table below reports times in micro seconds for 10 random matrices:
           | ------ float --------- | ------- double ------- |
     size  | before   after  ratio  |  before   after  ratio |
fixed	  1	 | 0.34     0.11   2.93   |  0.35     0.11   3.06  |
fixed	  2	 | 0.81     0.24   3.38   |  0.91     0.25   3.60  |
fixed	  3	 | 1.49     0.49   3.04   |  1.68     0.55   3.01  |
fixed	  4	 | 2.31     0.70   3.28   |  2.45     1.08   2.27  |
fixed	  5	 | 3.49     1.11   3.13   |  3.84     2.24   1.71  |
fixed	  6	 | 4.76     1.64   2.88   |  4.87     2.84   1.71  |
dyn     1	 | 0.50     0.40   1.23   |  0.51     0.40   1.26  |
dyn     2	 | 1.08     0.85   1.27   |  1.04     0.69   1.49  |
dyn     3	 | 1.76     1.26   1.40   |  1.84     1.14   1.60  |
dyn     4	 | 2.57     1.75   1.46   |  2.67     1.66   1.60  |
dyn     5	 | 3.80     2.64   1.43   |  4.00     2.48   1.61  |
dyn     6	 | 5.06     3.43   1.47   |  5.15     3.21   1.60  |
2019-02-11 13:58:24 +01:00
Eugene Zhulenev
21eb97d3e0 Add PacketConv implementation for non-vectorizable src expressions 2019-02-08 15:47:25 -08:00
Eugene Zhulenev
1e36166ed1 Optimize TensorConversion evaluator: do not convert same type 2019-02-08 15:13:24 -08:00
Steven Peters
953ca5ba2f Spline.h: fix spelling "spang" -> "span" 2019-02-08 06:23:24 +00:00
Eugene Zhulenev
59998117bb Don't do parallel_pack if we can use thread_local memory in tensor contractions 2019-02-07 09:21:25 -08:00
Gael Guennebaud
013cc3a6b3 Make GEMM fallback to GEMV for runtime vectors.
This is a more general and simpler version of changeset 4c0fa6ce0f
2019-02-07 16:24:09 +01:00
Gael Guennebaud
fa2fcb4895 Backed out changeset 4c0fa6ce0f 2019-02-07 16:07:08 +01:00
Gael Guennebaud
b3c4344a68 bug #1676: workaround GCC's bug in c++17 mode. 2019-02-07 15:21:35 +01:00
Rasmus Larsen
3091c03898 Merged in ezhulenev/eigen-01 (pull request PR-581)
Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing

Approved-by: Rasmus Larsen <rmlarsen@google.com>
Approved-by: Gael Guennebaud <g.gael@free.fr>
2019-02-05 22:45:20 +00:00
Eugene Zhulenev
8491127082 Do not reduce parallelism too much in contractions with small number of threads 2019-02-04 12:59:33 -08:00
Eugene Zhulenev
eb21bab769 Parallelize tensor contraction only by sharding dimension and use 'thread-local' memory for packing 2019-02-04 10:43:16 -08:00
Eugene Zhulenev
6d0f6265a9 Remove duplicated comment line 2019-02-04 10:30:25 -08:00
Eugene Zhulenev
690b2c45b1 Fix GeneralBlockPanelKernel Android compilation 2019-02-04 10:29:15 -08:00
Gael Guennebaud
871e2e5339 bug #1674: disable GCC's unsafe-math-optimizations in sin/cos vectorization (results are completely wrong otherwise) 2019-02-03 08:54:47 +01:00
Rasmus Larsen
e7b481ea74 Merged in rmlarsen/eigen (pull request PR-578)
Speed up Eigen matrix*vector and vector*matrix multiplication.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2019-02-02 01:53:44 +00:00
Sameer Agarwal
b55b5c7280 Speed up row-major matrix-vector product on ARM
The row-major matrix-vector multiplication code uses a threshold to
check if processing 8 rows at a time would thrash the cache.

This change introduces two modifications to this logic.

1. A smaller threshold for ARM and ARM64 devices.

The value of this threshold was determined empirically using a Pixel2
phone, by benchmarking a large number of matrix-vector products in the
range [1..4096]x[1..4096] and measuring performance separately on
small and little cores with frequency pinning.

On big (out-of-order) cores, this change has little to no impact. But
on the small (in-order) cores, the matrix-vector products are up to
700% faster. Especially on large matrices.

The motivation for this change was some internal code at Google which
was using hand-written NEON for implementing similar functionality,
processing the matrix one row at a time, which exhibited substantially
better performance than Eigen.

With the current change, Eigen handily beats that code.

2. Make the logic for choosing number of simultaneous rows apply
unifiormly to 8, 4 and 2 rows instead of just 8 rows.

Since the default threshold for non-ARM devices is essentially
unchanged (32000 -> 32 * 1024), this change has no impact on non-ARM
performance. This was verified by running the same set of benchmarks
on a Xeon desktop.
2019-02-01 15:23:53 -08:00
Rasmus Munk Larsen
4c0fa6ce0f Speed up Eigen matrix*vector and vector*matrix multiplication.
This change speeds up Eigen matrix * vector and vector * matrix multiplication for dynamic matrices when it is known at runtime that one of the factors is a vector.

The benchmarks below test

c.noalias()= n_by_n_matrix * n_by_1_matrix;
c.noalias()= 1_by_n_matrix * n_by_n_matrix;
respectively.

Benchmark measurements:

SSE:
Run on *** (72 X 2992 MHz CPUs); 2019-01-28T17:51:44.452697457-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                            1096       312    +71.5%
BM_MatVec/128                           4581      1464    +68.0%
BM_MatVec/256                          18534      5710    +69.2%
BM_MatVec/512                         118083     24162    +79.5%
BM_MatVec/1k                          704106    173346    +75.4%
BM_MatVec/2k                         3080828    742728    +75.9%
BM_MatVec/4k                        25421512   4530117    +82.2%
BM_VecMat/32                             352       130    +63.1%
BM_VecMat/64                            1213       425    +65.0%
BM_VecMat/128                           4640      1564    +66.3%
BM_VecMat/256                          17902      5884    +67.1%
BM_VecMat/512                          70466     24000    +65.9%
BM_VecMat/1k                          340150    161263    +52.6%
BM_VecMat/2k                         1420590    645576    +54.6%
BM_VecMat/4k                         8083859   4364327    +46.0%

AVX2:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:45:11.508545307-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                             619       120    +80.6%
BM_MatVec/128                           9693       752    +92.2%
BM_MatVec/256                          38356      2773    +92.8%
BM_MatVec/512                          69006     12803    +81.4%
BM_MatVec/1k                          443810    160378    +63.9%
BM_MatVec/2k                         2633553    646594    +75.4%
BM_MatVec/4k                        16211095   4327148    +73.3%
BM_VecMat/64                             925       227    +75.5%
BM_VecMat/128                           3438       830    +75.9%
BM_VecMat/256                          13427      2936    +78.1%
BM_VecMat/512                          53944     12473    +76.9%
BM_VecMat/1k                          302264    157076    +48.0%
BM_VecMat/2k                         1396811    675778    +51.6%
BM_VecMat/4k                         8962246   4459010    +50.2%

AVX512:
Run on *** (72 X 2993 MHz CPUs); 2019-01-28T17:35:17.239329863-08:00
CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_MatVec/64                             401       111    +72.3%
BM_MatVec/128                           1846       513    +72.2%
BM_MatVec/256                          36739      1927    +94.8%
BM_MatVec/512                          54490      9227    +83.1%
BM_MatVec/1k                          487374    161457    +66.9%
BM_MatVec/2k                         2016270    643824    +68.1%
BM_MatVec/4k                        13204300   4077412    +69.1%
BM_VecMat/32                             324       106    +67.3%
BM_VecMat/64                            1034       246    +76.2%
BM_VecMat/128                           3576       802    +77.6%
BM_VecMat/256                          13411      2561    +80.9%
BM_VecMat/512                          58686     10037    +82.9%
BM_VecMat/1k                          320862    163750    +49.0%
BM_VecMat/2k                         1406719    651397    +53.7%
BM_VecMat/4k                         7785179   4124677    +47.0%
Currently watchingStop watching
2019-01-31 14:24:08 -08:00
Gael Guennebaud
7ef879f6bf GEBP: improves pipelining in the 1pX4 path with FMA.
Prior to this change, a product with a LHS having 8 rows was faster with AVX-only than with AVX+FMA.
With AVX+FMA I measured a speed up of about x1.25 in such cases.
2019-01-30 23:45:12 +01:00
Gael Guennebaud
de77bf5d6c Fix compilation with ARM64. 2019-01-30 16:48:20 +01:00
Gael Guennebaud
d586686924 Workaround lack of support for arbitrary packet-type in Tensor by manually loading half/quarter packets in tensor contraction mapper. 2019-01-30 16:48:01 +01:00
Gael Guennebaud
eb4c6bb22d Fix conflicts and merge 2019-01-30 15:57:08 +01:00
Gael Guennebaud
e3622a0396 Slightly extend discussions on auto and move the content of the Pit falls wiki page here.
http://eigen.tuxfamily.org/index.php?title=Pit_Falls
2019-01-30 13:09:21 +01:00
Gael Guennebaud
df12fae8b8 According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101, the previous GCC issue is fixed in GCC trunk (will be gcc 9). 2019-01-30 11:52:28 +01:00
Gael Guennebaud
3775926bba ARM64 & GEBP: add specialization for double +30% speed up 2019-01-30 11:49:06 +01:00
Gael Guennebaud
be5b0f664a ARM64 & GEBP: Make use of vfmaq_laneq_f32 and workaround GCC's issue in generating good ASM 2019-01-30 11:48:25 +01:00
Christoph Hertzberg
a7779a9b42 Hide some annoying unused variable warnings in g++8.1 2019-01-29 16:48:21 +01:00
Gael Guennebaud
efe02292a6 Add recent gemm related changesets and various cleanups in perf-monitoring 2019-01-29 11:53:47 +01:00
Gael Guennebaud
8a06c699d0 bug #1669: fix PartialPivLU/inverse with zero-sized matrices. 2019-01-29 10:27:13 +01:00
Gael Guennebaud
a2a07e62b9 Fix compilation with c++03 (local class cannot be template arguments), and make SparseMatrix::assignDiagonal truly protected. 2019-01-29 10:10:07 +01:00
Gael Guennebaud
f489f44519 bug #1574: implement "sparse_matrix =,+=,-= diagonal_matrix" with smart insertion strategies of missing diagonal coeffs. 2019-01-28 17:29:50 +01:00
Gael Guennebaud
803fa79767 Move evaluator<SparseCompressedBase>::find(i,j) to a more general and reusable SparseCompressedBase::lower_bound(i,j) functiion 2019-01-28 17:24:44 +01:00
Gael Guennebaud
53560f9186 bug #1672: fix unit test compilation with MSVC by adding overloads of test_is* for long long (and factorize copy/paste code through a macro) 2019-01-28 13:47:28 +01:00
Christoph Hertzberg
c9825b967e Renaming even more I identifiers 2019-01-26 13:22:13 +01:00
Christoph Hertzberg
5a52e35f9a Renaming some more I identifiers 2019-01-26 13:18:21 +01:00
Rasmus Munk Larsen
71429883ee Fix compilation error in NEON GEBP specializaition of madd. 2019-01-25 17:00:21 -08:00
Christoph Hertzberg
934b8a1304 Avoid I as an identifier, since it may clash with the C-header complex.h 2019-01-25 14:54:39 +01:00
Gael Guennebaud
ec8a387972 cleanup 2019-01-24 10:24:45 +01:00
Gael Guennebaud
6908ce2a15 More thoroughly check variadic template ctor of fixed-size vectors 2019-01-24 10:24:28 +01:00
David Tellenbach
237b03b372 PR 574: use variadic template instead of initializer_list to implement fixed-size vector ctor from coefficients. 2019-01-23 00:07:19 +01:00
Christoph Hertzberg
bd6dadcda8 Tell doxygen that cxx11 math is available 2019-01-24 00:14:02 +01:00
Gael Guennebaud
c64d5d3827 Bypass inline asm for non compatible compilers. 2019-01-23 23:43:13 +01:00
Christoph Hertzberg
e16913a45f Fix name of tutorial snippet. 2019-01-23 10:35:06 +01:00
Gael Guennebaud
80f81f9c4b Cleanup SFINAE in Array/Matrix(initializer_list) ctors and minor doc editing. 2019-01-22 17:08:47 +01:00
David Tellenbach
db152b9ee6 PR 572: Add initializer list constructors to Matrix and Array (include unit tests and doc)
- {1,2,3,4,5,...} for fixed-size vectors only
- {{1,2,3},{4,5,6}} for the general cases
- {{1,2,3,4,5,....}} is allowed for both row and column-vector
2019-01-21 16:25:57 +01:00
Gael Guennebaud
543529da6a Add more extensive tests of Array ctors, including {} variants 2019-01-22 15:30:50 +01:00
nluehr
92774f0275 Replace host_define.h with cuda_runtime_api.h 2019-01-18 16:10:09 -06:00
Gael Guennebaud
d18f49cbb3 Fix compilation of unit tests with gcc and c++17 2019-01-18 11:12:42 +01:00
Christoph Hertzberg
da0a41b9ce Mask unused-parameter warnings, when building with NDEBUG 2019-01-18 10:41:14 +01:00
Rasmus Munk Larsen
2eccbaf3f7 Add missing logical packet ops for GPU and NEON. 2019-01-17 17:45:08 -08:00
Christoph Hertzberg
d575505d25 After fixing bug #1557, boostmultiprec_7 failed with NumericalIssue instead of NoConvergence (all that matters here is no Success) 2019-01-17 19:14:07 +01:00
Gael Guennebaud
ee3662abc5 Remove some useless const_cast 2019-01-17 18:27:49 +01:00
Gael Guennebaud
0fe6b7d687 Make nestByValue works again (broken since 3.3) and add unit tests. 2019-01-17 18:27:25 +01:00
Gael Guennebaud
4b7cf7ff82 Extend reshaped unit tests and remove useless const_cast 2019-01-17 17:35:32 +01:00
Gael Guennebaud
b57c9787b1 Cleanup useless const_cast and add missing broadcast assignment tests 2019-01-17 16:55:42 +01:00
Gael Guennebaud
be05d0030d Make FullPivLU use conjugateIf<> 2019-01-17 12:01:00 +01:00
Patrick Peltzer
bba2f05064 Boosttest only available for Boost version >= 1.53.0 2019-01-17 11:54:37 +01:00
Patrick Peltzer
15e53d5d93 PR 567: makes all dense solvers inherit SoverBase (LU,Cholesky,QR,SVD).
This changeset also includes:
 * add HouseholderSequence::conjugateIf
 * define int as the StorageIndex type for all dense solvers
 * dedicated unit tests, including assertion checking
 * _check_solve_assertion(): this method can be implemented in derived solver classes to implement custom checks
 * CompleteOrthogonalDecompositions: add applyZOnTheLeftInPlace, fix scalar type in applyZAdjointOnTheLeftInPlace(), add missing assertions
 * Cholesky: add missing assertions
 * FullPivHouseholderQR: Corrected Scalar type in _solve_impl()
 * BDCSVD: Unambiguous return type for ternary operator
 * SVDBase: Corrected Scalar type in _solve_impl()
2019-01-17 01:17:39 +01:00
Gael Guennebaud
7f32109c11 Add conjugateIf<bool> members to DesneBase, TriangularView, SelfadjointView, and make PartialPivLU use it. 2019-01-17 11:33:43 +01:00
Gael Guennebaud
7b35c26b1c Doc: remove link to porting guide 2019-01-17 10:35:50 +01:00
Gael Guennebaud
4759d9e86d Doc: add manual page on STL iterators 2019-01-17 10:35:14 +01:00
Gael Guennebaud
562985bac4 bug #1646: fix false aliasing detection for A.row(0) = A.col(0);
This changeset completely disable the detection for vectors for which are current mechanism cannot detect any positive aliasing anyway.
2019-01-17 00:14:27 +01:00
Rasmus Munk Larsen
7401e2541d Fix compilation error for logical packet ops with older compilers. 2019-01-16 14:43:33 -08:00
Rasmus Munk Larsen
ee550a2ac3 Fix flaky test for tensor fft. 2019-01-16 14:03:12 -08:00
Gael Guennebaud
0f028f61cb GEBP: fix swapped kernel mode with AVX512 and complex scalars 2019-01-16 22:26:38 +01:00
Gael Guennebaud
e118ce86fd GEBP: cleanup logic to choose between a 4 packets of 1 packet 2019-01-16 21:47:42 +01:00
Gael Guennebaud
70e133333d bug #1661: fix regression in GEBP and AVX512 2019-01-16 21:22:20 +01:00
Gael Guennebaud
ce88e297dc Add a comment stating this doc page is partly obsolete. 2019-01-16 16:29:02 +01:00
Gael Guennebaud
729d1291c2 bug #1585: update doc on lazy-evaluation 2019-01-16 16:28:17 +01:00
Gael Guennebaud
c8e40edac9 Remove Eigen2ToEigen3 migration page (obsolete since 3.3) 2019-01-16 16:27:00 +01:00
Gael Guennebaud
aeffdf909e bug #1617: add unit tests for empty triangular solve. 2019-01-16 15:24:59 +01:00
Gael Guennebaud
502f717980 bug #1646: disable aliasing detection for empty and 1x1 expression 2019-01-16 14:33:45 +01:00
Gael Guennebaud
0b466b6933 bug #1633: use proper type for madd temporaries, factorize RhsPacketx4. 2019-01-16 13:50:13 +01:00
Renjie Liu
dbfcceabf5 Bug: 1633: refactor gebp kernel and optimize for neon 2019-01-16 12:51:36 +08:00
Gael Guennebaud
2b70b2f570 Make Transform::rotation() an alias to Transform::linear() in the case of an Isometry 2019-01-15 22:50:42 +01:00
Gael Guennebaud
2c2c114995 Silent maybe-uninitialized warnings by gcc 2019-01-15 16:53:15 +01:00
Gael Guennebaud
6ec6bf0b0d Enable visitor on empty matrices (the visitor is left unchanged), and protect min/maxCoeff(Index*,Index*) on empty matrices by an assertion (+ doc & unit tests) 2019-01-15 15:21:14 +01:00
Gael Guennebaud
027e44ed24 bug #1592: makes partial min/max reductions trigger an assertion on inputs with a zero reduction length (+doc and tests) 2019-01-15 15:13:24 +01:00
Gael Guennebaud
f8bc5cb39e Fix detection of vector-at-time: use Rows/Cols instead of MaxRow/MaxCols.
This fix VectorXd(n).middleCol(0,0).outerSize() which was equal to 1.
2019-01-15 15:09:49 +01:00
Gael Guennebaud
32d7232aec fix always true warning with gcc 4.7 2019-01-15 11:18:48 +01:00
Gael Guennebaud
6cf7afa3d9 Typo 2019-01-15 11:04:37 +01:00
Gael Guennebaud
e7d4d4f192 cleanup 2019-01-15 10:51:03 +01:00
Rasmus Larsen
7b3aab0936 Merged in rmlarsen/eigen (pull request PR-570)
Add support for inverse hyperbolic functions. Fix cost of division.
2019-01-14 21:31:33 +00:00
Rasmus Munk Larsen
8bf00c2baf Remove extra <tr>. 2019-01-14 13:29:29 -08:00
Rasmus Munk Larsen
ec7fe83554 Merge. 2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2ea4efc0c3 Merge. 2019-01-14 13:26:58 -08:00
Rasmus Munk Larsen
2c5843dbbb Update documentation. 2019-01-14 13:26:34 -08:00
Gael Guennebaud
250dcd1fdb bug #1652: fix position of EIGEN_ALIGN16 attributes in Neon and Altivec 2019-01-14 21:45:56 +01:00
Rasmus Larsen
5a59452aae Merged eigen/eigen into default 2019-01-14 10:23:23 -08:00
Gael Guennebaud
3c9e6d206d AVX512: fix pgather/pscatter for Packet4cd and unaligned pointers 2019-01-14 17:57:28 +01:00
Gael Guennebaud
61b6eb05fe AVX512 (r)sqrt(double) was mistakenly disabled with clang and others 2019-01-14 17:28:47 +01:00
Gael Guennebaud
ccddeaad90 fix warning 2019-01-14 16:51:16 +01:00
Gael Guennebaud
d4881751d3 Doc: add Isometry in the list of supported Mode of Transform<> 2019-01-14 16:38:26 +01:00
Greg Coombe
9d988a1e1a Initialize isometric transforms like affine transforms.
The isometric transform, like the affine transform, has an implicit last
row of [0, 0, 0, 1]. This was not being properly initialized, as verified
by a new test function.
2019-01-11 23:14:35 -08:00
Gael Guennebaud
4356a55a61 PR 571: Implements an accurate argument reduction algorithm for huge inputs of sin/cos and call it instead of falling back to std::sin/std::cos.
This makes both the small and huge argument cases faster because:
- for small inputs this removes the last pselect
- for large inputs only the reduction part follows a scalar path,
the rest use the same SIMD path as the small-argument case.
2019-01-14 13:54:01 +01:00
Gael Guennebaud
f566724023 Fix StorageIndex FIXME in dense LU solvers 2019-01-13 17:54:30 +01:00
Rasmus Munk Larsen
1c6e6e2c3f Merge. 2019-01-11 17:47:11 -08:00
Rasmus Larsen
0ba3b45419 Merged eigen/eigen into default 2019-01-11 17:46:04 -08:00
Rasmus Munk Larsen
28ba1b2c32 Add support for inverse hyperbolic functions.
Fix cost of division.
2019-01-11 17:45:37 -08:00
Rasmus Munk Larsen
89c4001d6f Fix warnings in ptrue for complex and half types. 2019-01-11 14:10:57 -08:00
Rasmus Munk Larsen
a49d01edba Fix warnings in ptrue for complex and half types. 2019-01-11 13:18:17 -08:00
Eugene Zhulenev
1e6d15b55b Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-11 11:41:53 -08:00
Rasmus Munk Larsen
df29511ac0 Fix merge. 2019-01-11 10:36:36 -08:00
Rasmus Munk Larsen
8e71ed4cc9 Merge. 2019-01-11 10:35:07 -08:00
Rasmus Munk Larsen
fff5a5b579 Resolve. 2019-01-11 10:28:52 -08:00
Rasmus Munk Larsen
9396ace46b Merge. 2019-01-11 10:28:52 -08:00
Rasmus Larsen
74882471d0 Merged eigen/eigen into default 2019-01-11 10:20:55 -08:00
Rasmus Munk Larsen
e9936cf2b9 Merge. 2019-01-11 09:58:33 -08:00
Gael Guennebaud
9005f0111f Replace compiler's alignas/alignof extension by respective c++11 keywords when available. This also fix a compilation issue with gcc-4.7. 2019-01-11 17:10:54 +01:00
Mark D Ryan
3c9add6598 Remove reinterpret_cast from AVX512 complex implementation
The reinterpret_casts used in ptranspose(PacketBlock<Packet8cf,4>&)
ptranspose(PacketBlock<Packet8cf,8>&) don't appear to be working
correctly.  They're used to convert the kernel parameters to
PacketBlock<Packet8d,T>& so that the complex number versions of
ptranspose can be written using the existing double implementations.
Unfortunately, they don't seem to work and are responsible for 9 unit
test failures in the AVX512 build of tensorflow master.  This commit
fixes the issue by manually initialising PacketBlock<Packet8d,T>
variables with the contents of the kernel parameter before calling
the double version of ptranspose, and then copying the resulting
values back into the kernel parameter before returning.
2019-01-11 14:02:09 +01:00
Christoph Hertzberg
0522460a0d bug #1656: Enable failtests only if BUILD_TESTING is enabled 2019-01-11 11:07:56 +01:00
Eugene Zhulenev
0abe03764c Fix shorten-64-to-32 warning in TensorContractionThreadPool 2019-01-10 10:27:55 -08:00
Rasmus Munk Larsen
fcfced13ed Rename pones -> ptrue. Use _CMP_TRUE_UQ where appropriate. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
ce38c342c3 merge. 2019-01-09 17:20:33 -08:00
Rasmus Munk Larsen
a05ec7993e merge 2019-01-09 17:17:30 -08:00
Rasmus Munk Larsen
e15bb785ad Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
f6ba6071c5 Fix typo. 2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f04442526 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
8f178429b9 Collapsed revision
* Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
1119c73d22 Collapsed revision
* Add packet up "pones". Write pnot(a) as pxor(pones(a), a).
* Collapsed revision
* Simplify a bit.
* Undo useless diffs.
* Fix typo.
2019-01-09 16:34:23 -08:00
Rasmus Munk Larsen
e00521b514 Undo useless diffs. 2019-01-09 16:32:53 -08:00
Rasmus Munk Larsen
f2767112c8 Simplify a bit. 2019-01-09 16:29:18 -08:00
Rasmus Munk Larsen
cb955df9a6 Add packet up "pones". Write pnot(a) as pxor(pones(a), a). 2019-01-09 16:17:08 -08:00
Rasmus Larsen
cb3c059fa4 Merged eigen/eigen into default 2019-01-09 15:04:17 -08:00
Gael Guennebaud
d812f411c3 bug #1654: fix compilation with cuda and no c++11 2019-01-09 18:00:05 +01:00
Gael Guennebaud
3492a1ca74 fix plog(+inf) with AVX512 2019-01-09 16:53:37 +01:00
Gael Guennebaud
47810cf5b7 Add dedicated implementations of predux_any for AVX512, NEON, and Altivec/VSE 2019-01-09 16:40:42 +01:00
Gael Guennebaud
3f14e0d19e fix warning 2019-01-09 15:45:21 +01:00
Gael Guennebaud
aeec68f77b Add missing pcmp_lt and others for AVX512 2019-01-09 15:36:41 +01:00
Gael Guennebaud
e6b217b8dd bug #1652: implements a much more accurate version of vectorized sin/cos. This new version achieve same speed for SSE/AVX, and is slightly faster with FMA. Guarantees are as follows:
- no FMA: 1ULP up to 3pi, 2ULP up to sin(25966) and cos(18838), fallback to std::sin/cos for larger inputs
  - FMA: 1ULP up to sin(117435.992) and cos(71476.0625), fallback to std::sin/cos for larger inputs
2019-01-09 15:25:17 +01:00
Eugene Zhulenev
e70ffef967 Optimize evalShardedByInnerDim 2019-01-08 16:26:31 -08:00
Rasmus Munk Larsen
055f0b73db Add support for pcmp_eq and pnot, including for complex types. 2019-01-07 16:53:36 -08:00
Eugene Zhulenev
190d053e41 Explicitly set fill character when printing aligned data to ostream 2019-01-03 14:55:28 -08:00
Mark D Ryan
bc5dd4cafd PR560: Fix the AVX512f only builds
Commit c53eececb0
 introduced AVX512 support for complex numbers but required
avx512dq to build.  Commit 1d683ae2f5
 fixed some but not, it would seem all,
of the hard avx512dq dependencies.  Build failures are still evident on
Eigen and TensorFlow when compiling with just avx512f and no avx512dq
using gcc 7.3.  Looking at the code there does indeed seem to be a problem.
Commit c53eececb0
 calls avx512dq intrinsics directly, e.g, _mm512_extractf32x8_ps
and _mm512_and_ps.  This commit fixes the issue by replacing the direct
intrinsic calls with the various wrapper functions that are safe to use on
avx512f only builds.
2019-01-03 14:33:04 +01:00
Gael Guennebaud
697fba3bb0 Fix unit test 2018-12-27 11:20:47 +01:00
Gael Guennebaud
60d3fe9a89 One more stupid AVX 512 fix (I don't have direct access to AVX512 machines) 2018-12-24 13:05:03 +01:00
Gael Guennebaud
4aa667b510 Add EIGEN_STRONG_INLINE where required 2018-12-24 10:45:01 +01:00
Gael Guennebaud
961ff567e8 Add missing pcmp_lt_or_nan for AVX512 2018-12-23 22:13:29 +01:00
Gael Guennebaud
0f6f75bd8a Implement a faster fix for sin/cos of large entries that also correctly handle INF input. 2018-12-23 17:26:21 +01:00
Gael Guennebaud
38d704def8 Make sure that psin/pcos return number in [-1,1] for large inputs (though sin/cos on large entries is quite useless because it's inaccurate) 2018-12-23 16:13:24 +01:00
Gael Guennebaud
5713fb7feb Fix plog(+INF): it returned ~87 instead of +INF 2018-12-23 15:40:52 +01:00
Christoph Hertzberg
6dd93f7e3b Make code compile again for older compilers.
See https://stackoverflow.com/questions/7411515/
2018-12-22 13:09:07 +01:00
Gustavo Lima Chaves
1024a70e82 gebp: Add new ½ and ¼ packet rows per (peeling) round on the lhs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The patch works by altering the gebp lhs packing routines to also
consider ½ and ¼ packet lenght rows when packing, besides the original
whole package and row-by-row attempts. Finally, gebp itself will try
to fit a fraction of a packet at a time if:

i) ½ and/or ¼ packets are available for the current context (e.g. AVX2
   and SSE-sized SIMD register for x86)

ii) The matrix's height is favorable to it (it may be it's too small
    in that dimension to take full advantage of the current/maximum
    packet width or it may be the case that last rows may take
    advantage of smaller packets before gebp goes row-by-row)

This helps mitigate huge slowdowns one had on AVX512 builds when
compared to AVX2 ones, for some dimensions. Gains top at an extra 1x
in throughput. This patch is a complement to changeset 4ad359237a
.

Since packing is changed, Eigen users which would go for very
low-level API usage, like TensorFlow, will have to be adapted to work
fine with the changes.
2018-12-21 11:03:18 -08:00
Gustavo Lima Chaves
e763fcd09e Introducing "vectorized" byte on unpacket_traits structs
This is a preparation to a change on gebp_traits, where a new template
argument will be introduced to dictate the packet size, so it won't be
bound to the current/max packet size only anymore.

By having packet types defined early on gebp_traits, one has now to
act on packet types, not scalars anymore, for the enum values defined
on that class. One approach for reaching the vectorizable/size
properties one needs there could be getting the packet's scalar again
with unpacket_traits<>, then the size/Vectorizable enum entries from
packet_traits<>. It turns out guards like "#ifndef
EIGEN_VECTORIZE_AVX512" at AVX/PacketMath.h will hide smaller packet
variations of packet_traits<> for some types (and it makes sense to
keep that). In other words, one can't go back to the scalar and create
a new PacketType, as this will always lead to the maximum packet type
for the architecture.

The less costly/invasive solution for that, thus, is to add the
vectorizable info on every unpacket_traits struct as well.
2018-12-19 14:24:44 -08:00
Gael Guennebaud
efa4c9c40f bug #1615: slightly increase the default unrolling limit to compensate for changeset 101ea26f5e
.
This solves a performance regression with clang and 3x3 matrix products.
2018-12-13 10:42:39 +01:00
Gael Guennebaud
f20c991679 add changesets related to matrix product perf. 2018-12-13 10:33:29 +01:00
Rasmus Munk Larsen
dd6d65898a Fix shorten-64-to-32 warning. Use regular memcpy if num_threads==0. 2018-12-12 14:45:31 -08:00
Gael Guennebaud
f582ea3579 Fix compilation with expression template scalar type. 2018-12-12 22:47:00 +01:00
Gael Guennebaud
cfc70dc13f Add regression test for bug #1174 2018-12-12 18:03:31 +01:00
Gael Guennebaud
2de8da70fd bug #1557: fix RealSchur and EigenSolver for matrices with only zeros on the diagonal. 2018-12-12 17:30:08 +01:00
Gael Guennebaud
72c0bbe2bd Simplify handling of tests that must fail to compile.
Each test is now a normal ctest target, and build properties (compiler+flags) are preserved (instead of starting a new build-dir from scratch).
2018-12-12 15:48:36 +01:00
Gael Guennebaud
37c91e1836 bug #1644: fix warning 2018-12-11 22:07:20 +01:00
Gael Guennebaud
f159cf3d75 Artificially increase l1-blocking size for AVX512. +10% speedup with current kernels.
With a 6pX4 kernel (not committed yet), this provides a +20% speedup.
2018-12-11 15:36:27 +01:00
Gael Guennebaud
0a7e7af6fd Properly set the number of registers for AVX512 2018-12-11 15:33:17 +01:00
Gael Guennebaud
7166496f70 bug #1643: fix compilation issue with gcc and no optimizaion 2018-12-11 13:24:42 +01:00
Gael Guennebaud
0d90637838 enable spilling workaround on architectures with SSE/AVX 2018-12-10 23:22:44 +01:00
Gael Guennebaud
cf697272e1 Remove debug code. 2018-12-09 23:05:46 +01:00
Gael Guennebaud
450dc97c6b Various fixes in polynomial solver and its unit tests:
- cleanup noise in imaginary part of real roots
 - take into account the magnitude of the derivative to check roots.
 - use <= instead of < at appropriate places
2018-12-09 22:54:39 +01:00
Gael Guennebaud
348bb386d1 Enable "old" CMP0026 policy (not perfect, but better than dozens of warning) 2018-12-08 18:59:51 +01:00
Gael Guennebaud
bff90bf270 workaround "may be used uninitialized" warning 2018-12-08 18:58:28 +01:00
Gael Guennebaud
81c27325ae bug #1641: fix testing of pandnot and fix pandnot for complex on SSE/AVX/AVX512 2018-12-08 14:27:48 +01:00
Gael Guennebaud
426bce7529 fix EIGEN_GEBP_2PX4_SPILLING_WORKAROUND for non vectorized type, and non x86/64 target 2018-12-08 09:44:21 +01:00
Gael Guennebaud
cd25b538ab Fix noise in sparse_basic_3 (numerical cancellation) 2018-12-08 00:13:37 +01:00
Gael Guennebaud
efaf03bf96 Fix noise in lu unit test 2018-12-08 00:05:03 +01:00
Gael Guennebaud
956678a4ef bug #1515: disable gebp's 3pX4 micro kernel for MSVC<=19.14 because of register spilling. 2018-12-07 18:03:36 +01:00
Gael Guennebaud
7b6d0ff1f6 Enable FMA with MSVC (through /arch:AVX2). To make this possible, I also has to turn the #warning regarding AVX512-FMA to a #error. 2018-12-07 15:14:50 +01:00
Gael Guennebaud
f233c6194d bug #1637: workaround register spilling in gebp with clang>=6.0+AVX+FMA 2018-12-07 10:01:09 +01:00
Gael Guennebaud
ae59a7652b bug #1638: add a warning if avx512 is enabled without SSE/AVX FMA 2018-12-07 09:23:28 +01:00
Gael Guennebaud
4e7746fe22 bug #1636: fix gemm performance issue with gcc>=6 and no FMA 2018-12-07 09:15:46 +01:00
Gael Guennebaud
cbf2f4b7a0 AVX512f includes FMA but GCC does not define __FMA__ with -mavx512f only 2018-12-06 18:21:56 +01:00
Gael Guennebaud
1d683ae2f5 Fix compilation with avx512f only, i.e., no AVX512DQ 2018-12-06 18:11:07 +01:00
Gael Guennebaud
aab749b1c3 fix test regarding AVX512 vectorization of complexes. 2018-12-06 16:55:00 +01:00
Gael Guennebaud
c53eececb0 Implement AVX512 vectorization of std::complex<float/double> 2018-12-06 15:58:06 +01:00
Gael Guennebaud
3fba59ea59 temporarily re-disable SSE/AVX vectorization of complex<> on AVX512 -> this needs to be fixed though! 2018-12-06 00:13:26 +01:00
Gael Guennebaud
1ac2695ef7 bug #1636: fix compilation with some ABI versions. 2018-12-06 00:05:10 +01:00
Rasmus Munk Larsen
47d8b741b2 #elif -> #else to fix GPU build. 2018-12-05 13:19:31 -08:00
Rasmus Munk Larsen
8a02883d58 Merged in markdryan/eigen/avx512-contraction-2 (pull request PR-554)
Fix tensor contraction on AVX512 builds

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-05 18:19:32 +00:00
Gael Guennebaud
acc3459a49 Add help messages in the quick ref/ascii docs regarding slicing, indexing, and reshaping. 2018-12-05 17:17:23 +01:00
Gael Guennebaud
e2e897298a Fix page nesting 2018-12-05 17:13:46 +01:00
Christoph Hertzberg
c1d356e8b4 bug #1635: Use infinity from Numtraits instead of creating it manually. 2018-12-05 15:01:04 +01:00
Mark D Ryan
36f8f6d0be Fix evalShardedByInnerDim for AVX512 builds
evalShardedByInnerDim ensures that the values it passes for start_k and
end_k to evalGemmPartialWithoutOutputKernel are multiples of 8 as the kernel
does not work correctly when the values of k are not multiples of the
packet_size.  While this precaution works for AVX builds, it is insufficient
for AVX512 builds where the maximum packet size is 16.  The result is slightly
incorrect float32 contractions on AVX512 builds.

This commit fixes the problem by ensuring that k is always a multiple of
the packet_size if the packet_size is > 8.
2018-12-05 12:29:03 +01:00
Rasmus Munk Larsen
b57b31cce9 Merged in ezhulenev/eigen-01 (pull request PR-553)
Do not disable alignment with EIGEN_GPUCC

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-12-04 23:47:19 +00:00
Eugene Zhulenev
0bb15bb6d6 Update checks in ConfigureVectorization.h 2018-12-03 17:10:40 -08:00
Eugene Zhulenev
fd0fbfa9b5 Do not disable alignment with EIGEN_GPUCC 2018-12-03 15:54:10 -08:00
Christoph Hertzberg
919414b9fe bug #785: Make Cholesky decomposition work for empty matrices 2018-12-03 16:18:15 +01:00
Gael Guennebaud
0ea7ae7213 Add missing padd for Packet8i (it was implicitly generated by clang and gcc) 2018-11-30 21:52:25 +01:00
Gael Guennebaud
ab4df3e6ff bug #1634: remove double copy in move-ctor of non movable Matrix/Array 2018-11-30 21:25:51 +01:00
Gael Guennebaud
c785464430 Add packet sin and cos to Altivec/VSX and NEON 2018-11-30 16:21:33 +01:00
Gael Guennebaud
69ace742be Several improvements regarding packet-bitwise operations:
- add unit tests
- optimize their AVX512f implementation
- add missing implementations (half, Packet4f, ...)
2018-11-30 15:56:08 +01:00
Gael Guennebaud
fa87f9d876 Add psin/pcos on AVX512 -> almost for free, at last! 2018-11-30 14:33:13 +01:00
Gael Guennebaud
c68bd2fa7a Cleanup 2018-11-30 14:32:31 +01:00
Gael Guennebaud
f91500d303 Fix pandnot order in AVX512 2018-11-30 14:32:06 +01:00
Gael Guennebaud
b477d60bc6 Extend the generic psin_float code to handle cosine and make SSE and AVX use it (-> this adds pcos for AVX) 2018-11-30 11:26:30 +01:00
Gael Guennebaud
e19ece822d Disable fma gcc's workaround for gcc >= 8 (based on GEMM benchmarks) 2018-11-28 17:56:24 +01:00
Gael Guennebaud
41052f63b7 same for pmax 2018-11-28 17:17:28 +01:00
Gael Guennebaud
3e95e398b6 pmin/pmax o SSE: make sure to use AVX instruction with AVX enabled, and disable gcc workaround for fixed gcc versions 2018-11-28 17:14:20 +01:00
Gael Guennebaud
aa6097395b Add missing SSE/AVX type-casting in AVX512 mode 2018-11-28 16:09:08 +01:00
Gael Guennebaud
48fe78c375 bug #1630: fix linspaced when requesting smaller packet size than default one. 2018-11-28 13:15:06 +01:00
Eugene Zhulenev
80f1651f35 Use explicit packet type in SSE/PacketMath pldexp 2018-11-27 17:25:49 -08:00
Benoit Jacob
a4159dba08 do not read buffers out of bounds -- load only the 4 bytes we know exist here. Could also have done a vld1_lane_f32 but doing so here, without the overhead of initializing the unused lane, would have triggered used-of-uninitialized-value errors in tools such as ASan. Note that this code is sub-optimal before or after this change: we should be reading either 2 or 4 float32 values per load-instruction (2 for ARM in-order cores with an affinity for 8-byte loads; 4 for ARM out-of-order cores able to dual-issue 16-byte load instructions with arithmetic instructions). Before or after this patch, we are only loading 4 bytes of useful data here (even if before this patch, we were technically loading 8, only to use only the 4 first). 2018-11-27 16:53:14 -05:00
Gael Guennebaud
b131a4db24 bug #1631: fix compilation with ARM NEON and clang, and cleanup the weird pshiftright_and_cast and pcast_and_shiftleft functions. 2018-11-27 23:45:00 +01:00
Gael Guennebaud
a1a5fbbd21 Update pshiftleft to pass the shift as a true compile-time integer. 2018-11-27 22:57:30 +01:00
Gael Guennebaud
fa7fd61eda Unify SSE/AVX psin functions.
It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
 - add packet-float to packet-int type traits
 - add packet float<->int reinterpret casts
 - add faster pselect for AVX based on blendv
2018-11-27 22:41:51 +01:00
Rasmus Munk Larsen
08edbc8cfe Merged in bjacob/eigen/fixbuild (pull request PR-549)
fix the build on 64-bit ARM when NEON is disabled
2018-11-27 20:14:12 +00:00
Benoit Jacob
7b1cb8a440 fix the build on 64-bit ARM when NEON is disabled 2018-11-27 11:11:02 -05:00
Gael Guennebaud
b5695a6008 Unify Altivec/VSX pexp(double) with default implementation 2018-11-27 13:53:05 +01:00
Gael Guennebaud
7655a8af6e cleanup 2018-11-26 23:21:29 +01:00
Gael Guennebaud
502f92fa10 Unify SSE and AVX pexp for double. 2018-11-26 23:12:44 +01:00
Gael Guennebaud
4a347a0054 Unify NEON's pexp with generic implementation 2018-11-26 22:15:44 +01:00
Gael Guennebaud
5c8406babc Unify Altivec/VSX's pexp with generic implementation 2018-11-26 16:47:13 +01:00
Gael Guennebaud
cf8b85d5c5 Unify SSE and AVX implementation of pexp 2018-11-26 16:36:19 +01:00
Gael Guennebaud
c2f35b1b47 Unify Altivec/VSX's plog with generic implementation, and enable it! 2018-11-26 15:58:11 +01:00
Gael Guennebaud
c24e98e6a8 Unify NEON's plog with generic implementation 2018-11-26 15:02:16 +01:00
Gael Guennebaud
2c44c40114 First step toward a unification of packet log implementation, currently only SSE and AVX are unified.
To this end, I added the following functions: pzero, pcmp_*, pfrexp, pset1frombits functions.
2018-11-26 14:21:24 +01:00
Gael Guennebaud
5f6045077c Make SSE/AVX pandnot(A,B) consistent with generic version, i.e., "A and not B" 2018-11-26 14:14:07 +01:00
Gael Guennebaud
382279eb7f Extend unit test to recursively check half-packet types and non packet types 2018-11-26 14:10:07 +01:00
Gael Guennebaud
0836a715d6 bug #1611: fix plog(0) on NEON 2018-11-26 09:08:38 +01:00
Patrik Huber
95566eeed4 Fix typos 2018-11-23 22:22:14 +00:00
Gael Guennebaud
e3b22a6bd0 merge 2018-11-23 16:06:21 +01:00
Gael Guennebaud
ccabdd88c9 Fix reserved usage of double __ in macro names 2018-11-23 16:01:47 +01:00
Gael Guennebaud
572d62697d check two ctors 2018-11-23 15:37:09 +01:00
Gael Guennebaud
354f14293b Fix double = bool ! 2018-11-23 15:12:06 +01:00
Gael Guennebaud
a7842daef2 Fix several uninitialized member from ctor 2018-11-23 15:10:28 +01:00
Christoph Hertzberg
ea60a172cf Add default constructor to Bar to make test compile again with clang-3.8 2018-11-23 14:24:22 +01:00
Christoph Hertzberg
806352d844 Small typo found be Patrick Huber (pull request PR-547) 2018-11-23 12:34:27 +00:00
Gael Guennebaud
a476054879 bug #1624: improve matrix-matrix product on ARM 64, 20% speedup 2018-11-23 10:25:19 +01:00
Gael Guennebaud
c685fe9838 Move regression test to right unit test file 2018-11-21 15:59:47 +01:00
Gael Guennebaud
4b2cebade8 Workaround weird MSVC bug 2018-11-21 15:53:37 +01:00
Christoph Hertzberg
0ec8afde57 Fixed most conversion warnings in MatrixFunctions module 2018-11-20 16:23:28 +01:00
Deven Desai
e7e6809e6b ROCm/HIP specfic fixes + updates
1. Eigen/src/Core/arch/GPU/Half.h

   Updating the HIPCC implementation half so that it can declared as a __shared__ variable


2. Eigen/src/Core/util/Macros.h, Eigen/src/Core/util/Memory.h

   introducing a EIGEN_USE_STD(func) macro that calls
   - std::func be default
   - ::func when eigen is being compiled with HIPCC

   This change was requested in the previous HIP PR
   (https://bitbucket.org/eigen/eigen/pull-requests/518/pr-with-hip-specific-fixes-for-the-eigen/diff)


3. unsupported/Eigen/CXX11/src/Tensor/TensorDeviceThreadPool.h

   Removing EIGEN_DEVICE_FUNC attribute from pure virtual methods as it is not supported by HIPCC


4. unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h

   Disabling the template specializations of InnerMostDimReducer as they run into HIPCC link errors
2018-11-19 18:13:59 +00:00
Gael Guennebaud
6a510fe69c Make MaxPacketSize a true upper bound, even for fixed-size inputs 2018-11-16 11:25:32 +01:00
Gael Guennebaud
43c987b1c1 Add explicit regression test for bug #1622 2018-11-16 11:24:51 +01:00
Mark D Ryan
670d56441c PR 544: Set requestedAlignment correctly for SliceVectorizedTraversals
Commit aa110e681b
 optimised the multiplication of small dyanmically
sized matrices by restricting the packet size to a maximum of 4, increasing
the chances that SIMD instructions are used in the computation.  However, it
introduced a mismatch between the packet size and the requestedAlignment.  This
mismatch can lead to crashes when the destination is not aligned.  This patch
fixes the issue by ensuring that the AssignmentTraits are correctly computed
when using a restricted packet size.
* * *
Bind LinearPacketType to MaxPacketSize

This commit applies any packet size limit specified when instantiating
copy_using_evaluator_traits to the LinearPacketType, providing that the
size of the destination is not known at compile time.
* * *
Add unit test for restricted packet assignment

A new unit test is added to check that multiplication of small dynamically
sized matrices works correctly when the packet size is restricted to 4 and
the destination is unaligned.
2018-11-13 16:15:08 +01:00
Nikolaus Demmel
3dc0845046 Fix typo in comment on EIGEN_MAX_STATIC_ALIGN_BYTES 2018-11-14 18:11:30 +01:00
Gael Guennebaud
7fddc6a51f typo 2018-11-14 14:43:18 +01:00
Gael Guennebaud
449f948b2a help doxygen linking to DenseBase::NulllaryExpr 2018-11-14 14:42:59 +01:00
Gael Guennebaud
4263f23c28 Improve doc on multi-threading and warn about hyper-threading 2018-11-14 14:42:29 +01:00
Gael Guennebaud
db529ae4ec doxygen does not like \addtogroup and \ingroup in the same line 2018-11-14 14:42:06 +01:00
Rasmus Munk Larsen
72928a2c8a Merged in rmlarsen/eigen2 (pull request PR-543)
Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth.

Approved-by: Eugene Zhulenev <ezhulenev@google.com>
2018-11-13 17:10:30 +00:00
Rasmus Munk Larsen
cda479d626 Remove accidental changes. 2018-11-12 18:34:04 -08:00
Rasmus Munk Larsen
719d9aee65 Add parallel memcpy to TensorThreadPoolDevice in Eigen, but limit the number of threads to 4, beyond which we just seem to be wasting CPU cycles as the threads contend for memory bandwidth. 2018-11-12 17:46:02 -08:00
Rasmus Munk Larsen
77b447c24e Add optimized version of logistic function for float. As an example, this is about 50% faster than the existing version on Haswell using AVX. 2018-11-12 13:42:24 -08:00
Gael Guennebaud
c81bdbdadc Add manual doc on STL-compatible iterators 2018-11-12 22:06:33 +01:00
Gael Guennebaud
0105146915 Fix warning in c++03 2018-11-10 09:11:38 +01:00
Rasmus Munk Larsen
93f9988a7e A few small fixes to a) prevent throwing in ctors and dtors of the threading code, and b) supporting matrix exponential on platforms with 113 bits of mantissa for long doubles. 2018-11-09 14:15:32 -08:00
Gael Guennebaud
784a3f13cf bug #1619: fix mixing of const and non-const generic iterators 2018-11-09 21:45:10 +01:00
Gael Guennebaud
db9a9a12ba bug #1619: make const and non-const iterators compatible 2018-11-09 16:49:19 +01:00
Gael Guennebaud
fbd6e7b025 add missing ref to a.zeta(b) 2018-11-09 13:53:42 +01:00
Gael Guennebaud
dffd1e11de Limit the size of the toc 2018-11-09 13:52:34 +01:00
Gael Guennebaud
a88e0a0e95 Update doxy hacks wrt doxygen 1.8.13/14 2018-11-09 13:52:10 +01:00
Gael Guennebaud
bd9a00718f Let doxygen sees lastN 2018-11-09 11:35:48 +01:00
Gael Guennebaud
d7c644213c Add and update manual pages for slicing, indexing, and reshaping. 2018-11-09 11:35:27 +01:00
Gael Guennebaud
a368848473 Recent xcode versions does support EIGEN_HAS_STATIC_ARRAY_TEMPLATE 2018-11-09 10:33:17 +01:00
Gael Guennebaud
f62a0f69c6 Fix max-size in indexed-view 2018-11-08 18:40:22 +01:00
Gael Guennebaud
bf495859ff Merged in glchaves/eigen (pull request PR-539)
Vectorize row-by-row gebp loop iterations on 16 packets as well
2018-11-07 07:21:15 +00:00
Gael Guennebaud
995730fc6c Add option to disable plot generation 2018-11-07 00:41:16 +01:00
Gustavo Lima Chaves
4ad359237a Vectorize row-by-row gebp loop iterations on 16 packets as well
Signed-off-by: Gustavo Lima Chaves <gustavo.lima.chaves@intel.com>
Signed-off-by: Mark D. Ryan <mark.d.ryan@intel.com>
2018-11-06 10:48:42 -08:00
Gael Guennebaud
9d318b92c6 add unit tests for bug #1619 2018-11-01 15:14:50 +01:00
Matthieu Vigne
8d7a73e48e bug #1617: Fix SolveTriangular.solveInPlace crashing for empty matrix.
This made FullPivLU.kernel() crash when used on the zero matrix.
Add unit test for FullPivLU.kernel() on the zero matrix.
2018-10-31 20:28:18 +01:00
Christoph Hertzberg
66b28e290d bug #1618: Use different power-of-2 check to avoid MSVC warning 2018-11-01 13:23:19 +01:00
Rasmus Munk Larsen
07fcdd1438 Merged in ezhulenev/eigen-02 (pull request PR-534)
Fix cxx11_tensor_{block_access, reduction} tests
2018-10-25 18:34:35 +00:00
Eugene Zhulenev
8a977c1f46 Fix cxx11_tensor_{block_access, reduction} tests 2018-10-25 11:31:29 -07:00
Halie Murray-Davis
fb62d6d96e Fix typo in tutorial documentation. 2018-10-25 04:55:34 +00:00
Christoph Hertzberg
b5f077d22c Document EIGEN_NO_IO preprocessor directive 2018-10-25 16:49:25 +02:00
Christian von Schultz
4a40b3785d Collapsed revision (based on pull request PR-325)
* Support compiling without IO streams

Add the preprocessor definition EIGEN_NO_IO which, if defined,
disables all use of the IO streams part of the standard library.
2018-10-22 21:14:40 +02:00
Rasmus Munk Larsen
14054e217f Do not rely on the compiler generating __device__ functions for constexpr in Cuda (via EIGEN_CONSTEXPR_ARE_DEVICE_FUNC. This breaks several target in the TensorFlow Cuda build, e.g.,
INFO: From Compiling tensorflow/core/kernels/maxpooling_op_gpu.cu.cc:
/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNHWC< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code"

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: calling a __host__ function("std::equal_to<float> ::operator () const") from a __global__ function("tensorflow::_NV_ANON_NAMESPACE::MaxPoolGradBackwardNoMaskNCHW< ::Eigen::half> ") is not allowed

/b/f/w/run/external/eigen_archive/Eigen/src/Core/arch/GPU/Half.h(197): error: identifier "std::equal_to<float> ::operator () const" is undefined in device code

4 errors detected in the compilation of "/tmp/tmpxft_00000011_00000000-6_maxpooling_op_gpu.cu.cpp1.ii".
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: output 'tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o' was not created
ERROR: /tmpfs/tensor_flow/tensorflow/core/kernels/BUILD:3753:1: Couldn't build file tensorflow/core/kernels/_objs/pooling_ops_gpu/maxpooling_op_gpu.cu.pic.o: not all outputs were created or valid
2018-10-22 16:18:24 -07:00
Rasmus Munk Larsen
954b4ca9d0 Suppress compiler warning about unused global variable. 2018-10-22 13:48:56 -07:00
Rasmus Munk Larsen
9caafca550 Merged in rmlarsen/eigen (pull request PR-532)
Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available.
2018-10-19 21:37:14 +00:00
Christoph Hertzberg
449ff74672 Fix most Doxygen warnings. Also add links to stable documentation from unsupported modules (by using the corresponding Doxytags file).
Manually grafted from d107a371c6
2018-10-19 21:10:28 +02:00
Rasmus Munk Larsen
39fec15d5c Merged eigen/eigen into default 2018-10-19 09:48:19 -07:00
Christoph Hertzberg
40fa6f98bf bug #1606: Explicitly set the standard before find_package(StandardMathLibrary). Also replace EIGEN_COMPILER_SUPPORT_CXX11 in favor of EIGEN_COMPILER_SUPPORT_CPP11.
Grafted manually from a4afa90d16
2018-10-19 17:20:51 +02:00
Rasmus Munk Larsen
d8f285852b Only set EIGEN_CONSTEXPR_ARE_DEVICE_FUNC for clang++ if cxx_relaxed_constexpr is available. 2018-10-18 16:55:02 -07:00
Rasmus Munk Larsen
dda68f56ec Fix GPU build due to gpu_assert not always being defined. 2018-10-18 16:29:29 -07:00
Gael Guennebaud
1dcf5a6ed8 fix typo in doc 2018-10-17 09:29:36 +02:00
Eugene Zhulenev
9e96e91936 Move from rvalue arguments in ThreadPool enqueue* methods 2018-10-16 16:48:32 -07:00
Eugene Zhulenev
217d839816 Reduce thread scheduling overhead in parallelFor 2018-10-16 14:53:06 -07:00
Rasmus Munk Larsen
d52763bb4f Merged in ezhulenev/eigen-02 (pull request PR-528)
[TensorBlockIO] Check if it's allowed to squeeze inner dimensions

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-10-16 15:39:40 +00:00
Gael Guennebaud
0f780bb0b4 Fix float-to-double warning 2018-10-16 09:19:45 +02:00
Eugene Zhulenev
900c7c61bb Check if it's allowed to squueze inner dimensions in TensorBlockIO 2018-10-15 16:52:33 -07:00
Gael Guennebaud
a39e0f7438 bug #1612: fix regression in "outer-vectorization" of partial reductions for PacketSize==1 (aka complex<double>) 2018-10-16 01:04:25 +02:00
Gael Guennebaud
e3b85771d7 Show call stack in case of failing sparse solving. 2018-10-16 00:43:44 +02:00
Gael Guennebaud
d2d570c116 Remove useless (and broken) resize 2018-10-16 00:42:48 +02:00
Gael Guennebaud
f0fb95135d Iterative solvers: unify and fix handling of multiple rhs.
m_info was not properly computed and the logic was repeated in several places.
2018-10-15 23:47:46 +02:00
Gael Guennebaud
2747b98cfc DGMRES: fix null rhs, fix restart, fix m_isDeflInitialized for multiple solve 2018-10-15 23:46:00 +02:00
Gael Guennebaud
d835a0bf53 relax number of iterations checks to avoid false negatives 2018-10-15 10:23:32 +02:00
Gael Guennebaud
3a33db4de5 merge 2018-10-15 09:22:27 +02:00
Rasmus Munk Larsen
0ed811a9c1 Suppress unused variable compiler warning in sparse subtest 3. 2018-10-12 13:41:57 -07:00
Mark D Ryan
aa110e681b PR 526: Speed up multiplication of small, dynamically sized matrices
The Packet16f, Packet8f and Packet8d types are too large to use with dynamically
sized matrices typically processed by the SliceVectorizedTraversal specialization of
the dense_assignment_loop.  Using these types is likely to lead to little or no
vectorization.  Significant slowdown in the multiplication of these small matrices can
be observed when building with AVX and AVX512 enabled.

This patch introduces a new dense_assignment_kernel that is used when
computing small products whose operands have dynamic dimensions.  It ensures that the
PacketSize used is no larger than 4, thereby increasing the chance that vectorized
instructions will be used when computing the product.

I tested all 969 possible combinations of M, K, and N that are handled by the
dense_assignment_loop on x86 builds.  Although a few combinations are slowed down
by this patch they are far outnumbered by the cases that are sped up, as the
following results demonstrate.


Disabling Packed8d on AVX512 builds:

Total Cases:             969
Better:                  511
Worse:                   85
Same:                    373
Max Improvement:         169.00% (4 8 6)
Max Degradation:         36.50% (8 5 3)
Median Improvement:      35.46%
Median Degradation:      17.41%
Total FLOPs Improvement: 19.42%


Disabling Packet16f and Packed8f on AVX512 builds:

Total Cases:             969
Better:                  658
Worse:                   5
Same:                    306
Max Improvement:         214.05% (8 6 5)
Max Degradation:         22.26% (16 2 1)
Median Improvement:      60.05%
Median Degradation:      13.32%
Total FLOPs Improvement: 59.58%


Disabling Packed8f on AVX builds:

Total Cases:             969
Better:                  663
Worse:                   96
Same:                    210
Max Improvement:         155.29% (4 10 5)
Max Degradation:         35.12% (8 3 2)
Median Improvement:      34.28%
Median Degradation:      15.05%
Total FLOPs Improvement: 26.02%
2018-10-12 15:20:21 +02:00
Eugene Zhulenev
d9392f9e55 Fix code format 2018-11-02 14:51:35 -07:00
Eugene Zhulenev
118520f04a Workaround nbcc+msvc compiler bug 2018-11-02 14:48:28 -07:00
Christoph Hertzberg
24dc076519 Explicitly convert 0 to Scalar for custom types 2018-10-12 10:22:19 +02:00
Gael Guennebaud
8214cf1896 Make sparse_basic includable from sparse_extra, but disable it since sparse_basic(DynamicSparseMatrix) does not compile at all anyways 2018-10-11 10:27:23 +02:00
Gael Guennebaud
43633fbaba Fix warning with AVX512f 2018-10-11 10:13:48 +02:00
Gael Guennebaud
97e2c808e9 Fix avx512 plog(NaN) to return NaN instead of +inf 2018-10-11 10:13:13 +02:00
Gael Guennebaud
b3f66d29a5 Enable avx512 plog with clang 2018-10-11 10:12:21 +02:00
Gael Guennebaud
2ef1b39674 Relaxed fastmath unit test: if std::foo fails, then let's only trigger a warning is numext::foo fails too.
A true error will triggered only if std::foo works but our numext::foo fails.
2018-10-11 09:45:30 +02:00
Gael Guennebaud
1d5a6363ea relax numerical tests from equal to approx (x87) 2018-10-11 09:29:56 +02:00
Gael Guennebaud
f0aa7e40fc Fix regression in changeset 5335659c47 2018-10-10 23:47:30 +02:00
Gael Guennebaud
ce243ee45b bug #520: add diagmat +/- diagmat operators. 2018-10-10 23:38:22 +02:00
Gael Guennebaud
5335659c47 Merged in ezhulenev/eigen-02 (pull request PR-525)
Fix bug in partial reduction of expressions requiring evaluation
2018-10-10 20:59:00 +00:00
Gael Guennebaud
eec0dfd688 bug #632: add specializations for res ?= dense +/- sparse and res ?= sparse +/- dense.
They are rewritten as two compound assignment to by-pass hybrid dense-sparse iterator.
2018-10-10 22:50:15 +02:00
Eugene Zhulenev
8e6dc2c81d Fix bug in partial reduction of expressions requiring evaluation 2018-10-10 13:23:52 -07:00
Gael Guennebaud
76ceae49c1 bug #1609: add inplace transposition unit test 2018-10-10 21:48:58 +02:00
Eugene Zhulenev
2bf1a31d81 Use void type if stl-style iterators are not supported 2018-10-10 10:31:40 -07:00
Christoph Hertzberg
f3130ee1ba Avoid empty macro arguments 2018-10-10 08:23:40 +02:00
Rasmus Munk Larsen
e8918743c1 Merged in ezhulenev/eigen-01 (pull request PR-523)
Compile time detection for unimplemented stl-style iterators
2018-10-09 23:42:01 +00:00
Eugene Zhulenev
befcac883d Hide stl-container detection test under #if 2018-10-09 15:36:01 -07:00
Eugene Zhulenev
c0ca8a9fa3 Compile time detection for unimplemented stl-style iterators 2018-10-09 15:28:23 -07:00
Gael Guennebaud
1dd1f8e454 bug #65: add vectorization of partial reductions along the outer-dimension, for instance: colmajor_mat.rowwise().mean() 2018-10-09 23:36:50 +02:00
Gael Guennebaud
bfa2a81a50 Make redux_vec_unroller more flexible regarding packet-type 2018-10-09 23:30:41 +02:00
Gael Guennebaud
c0c3be26ed Extend unit tests for partial reductions 2018-10-09 22:54:54 +02:00
Christoph Hertzberg
3f2c8b7ff0 Fix a lot of Doxygen warnings in Tensor module 2018-10-09 20:22:47 +02:00
Christoph Hertzberg
f6359ad795 Small Doxygen fixes 2018-10-09 19:33:35 +02:00
Gael Guennebaud
7a882c05ab Fix compilation on CUDA 2018-10-09 17:02:16 +02:00
Gael Guennebaud
93a6192e98 fix mpreal for mpfr<4.0.0 2018-10-09 09:15:22 +02:00
Rasmus Munk Larsen
d16634c4d4 Fix out-of bounds access in TensorArgMax.h. 2018-10-08 16:41:36 -07:00
Rasmus Munk Larsen
1a737e1d6a Fix contraction test. 2018-10-08 16:37:07 -07:00
Gael Guennebaud
e00487f7d2 bug #1603: add parenthesis around ternary operator in function body as well as a harmless attempt to make MSVC happy. 2018-10-08 22:27:04 +02:00
Gael Guennebaud
2eda9783de typo 2018-10-08 21:37:46 +02:00
Gael Guennebaud
c6e2dde714 fix c++11 deprecated warning 2018-10-08 18:26:05 +02:00
Gael Guennebaud
6cc9b2c831 fix warning in mpreal.h 2018-10-08 18:25:37 +02:00
Gael Guennebaud
649d4758a6 merge 2018-10-08 17:35:18 +02:00
Gael Guennebaud
aa5820056e Unify c++11 usage in doc's examples and snippets 2018-10-08 17:32:54 +02:00
Gael Guennebaud
e29bfe8479 Update included mpreal header to 3.6.5 and fix deprecated warnings. 2018-10-08 17:09:23 +02:00
Gael Guennebaud
64b1a15318 Workaround stupid warning 2018-10-08 12:01:18 +02:00
Gael Guennebaud
c9643f4a6f Disable C++11 deprecated warning when limiting Eigen to C++98 2018-10-08 10:43:43 +02:00
Gael Guennebaud
774bb9d6f7 fix a doxygen issue 2018-10-08 09:30:15 +02:00
Gael Guennebaud
6c3f6cd52b Fix maybe-uninitialized warning 2018-10-07 23:29:51 +02:00
Gael Guennebaud
bcb7c66b53 Workaround gcc's alloc-size-larger-than= warning 2018-10-07 21:55:59 +02:00
Gael Guennebaud
16b2001ece Fix gcc 8.1 warning: "maybe use uninitialized" 2018-10-07 21:54:49 +02:00
Gael Guennebaud
6512c5e136 Implement a better workaround for GCC's bug #87544 2018-10-07 15:00:05 +02:00
Gael Guennebaud
409132bb81 Workaround gcc bug making it trigger an invalid warning 2018-10-07 09:23:15 +02:00
Gael Guennebaud
c6a1ab4036 Workaround MSVC compilation issue 2018-10-06 13:49:17 +02:00
Gael Guennebaud
e21766c6f5 Clarify doc of rowwise/colwise/vectorwise. 2018-10-05 23:12:09 +02:00
Gael Guennebaud
d92f004ab7 Simplify API by removing allCols/allRows and reusing rowwise/colwise to define iterators over rows/columns 2018-10-05 23:11:21 +02:00
Gael Guennebaud
91613bf2c2 Add support for c++11 snippets 2018-10-05 23:08:39 +02:00
Gael Guennebaud
3e64b1fc86 Move iterators to internal, improve doc, make unit test c++03 friendly 2018-10-03 15:13:15 +02:00
Gael Guennebaud
2b2b4d0580 fix unused warning 2018-10-03 14:16:21 +02:00
Gael Guennebaud
8a1e98240e add unit tests 2018-10-03 11:56:27 +02:00
Gael Guennebaud
5f26f57598 Change the logic of A.reshaped<Order>() to be a simple alias to A.reshaped<Order>(AutoSize,fix<1>).
This means that now AutoOrder is allowed, and it always return a column-vector.
2018-10-03 11:41:47 +02:00
Gael Guennebaud
0481900e25 Add pointer-based iterator for direct-access expressions 2018-10-02 23:44:36 +02:00
Christoph Hertzberg
c5f1d0a72a Fix shadow warning 2018-10-02 19:01:08 +02:00
Christoph Hertzberg
b92c71235d Move struct outside of method for C++03 compatibility. 2018-10-02 18:59:10 +02:00
Christoph Hertzberg
051f9c1aff Make code compile in C++03 mode again 2018-10-02 18:36:30 +02:00
Christoph Hertzberg
b786ce8c72 Fix conversion warning ... again 2018-10-02 18:35:25 +02:00
Gael Guennebaud
8c38528168 Factorize RowsProxy/ColsProxy and related iterators using subVector<>(Index) 2018-10-02 14:03:26 +02:00
Gael Guennebaud
12487531ce Add templated subVector<Vertical/Horizonal>(Index) aliases to col/row(Index) methods (plus subVectors<>() to retrieve the number of rows/columns) 2018-10-02 14:02:34 +02:00
Gael Guennebaud
37e29fc893 Use Index instead of ptrdiff_t or int, fix random-accessors. 2018-10-02 13:29:32 +02:00
Gael Guennebaud
de2efbc43c bug #1605: workaround ABI issue with vector types (aka __m128) versus scalar types (aka float) 2018-10-01 23:45:55 +02:00
Gael Guennebaud
b0c66adfb1 bug #231: initial implementation of STL iterators for dense expressions 2018-10-01 23:21:37 +02:00
Christoph Hertzberg
564ca71e39 Merged in deven-amd/eigen/HIP_fixes (pull request PR-518)
PR with HIP specific fixes (for the eigen nightly regression failures in HIP mode)
2018-10-01 16:51:04 +00:00
Deven Desai
94898488a6 This commit contains the following (HIP specific) updates:
- unsupported/Eigen/CXX11/src/Tensor/TensorReductionGpu.h
  Changing "pass-by-reference" argument to be "pass-by-value" instead
  (in a  __global__ function decl).
  "pass-by-reference" arguments to __global__ functions are unwise,
  and will be explicitly flagged as errors by the newer versions of HIP.

- Eigen/src/Core/util/Memory.h
- unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
  Changes introduced in recent commits breaks the HIP compile.
  Adding EIGEN_DEVICE_FUNC attribute to some functions and
  calling ::malloc/free instead of the corresponding std:: versions
  to get the HIP compile working again

- unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
  Change introduced a recent commit breaks the HIP compile
  (link stage errors out due to failure to inline a function).
  Disabling the recently introduced code (only for HIP compile), to get
  the eigen nightly testing going again.
  Will submit another PR once we have te proper fix.

- Eigen/src/Core/util/ConfigureVectorization.h
  Enabling GPU VECTOR support when HIP compiler is in use
  (for both the host and device compile phases)
2018-10-01 14:28:37 +00:00
Rasmus Munk Larsen
2088c0897f Merged eigen/eigen into default 2018-09-28 16:00:46 -07:00
Rasmus Munk Larsen
31629bb964 Get rid of unused variable warning. 2018-09-28 16:00:09 -07:00
Eugene Zhulenev
bb13d5d917 Fix bug in copy optimization in Tensor slicing. 2018-09-28 14:34:42 -07:00
Rasmus Munk Larsen
104e8fa074 Fix a few warnings and rename a variable to not shadow "last". 2018-09-28 12:00:08 -07:00
Rasmus Munk Larsen
7c1b47840a Merged in ezhulenev/eigen-01 (pull request PR-514)
Add tests for evalShardedByInnerDim contraction + fix bugs
2018-09-28 18:37:54 +00:00
Eugene Zhulenev
524c81f3fa Add tests for evalShardedByInnerDim contraction + fix bugs 2018-09-28 11:24:08 -07:00
Christoph Hertzberg
86ba50be39 Fix integer conversion warnings 2018-09-28 19:33:39 +02:00
Eugene Zhulenev
e95696acb3 Optimize TensorBlockCopyOp 2018-09-27 14:49:26 -07:00
Eugene Zhulenev
9f33e71e9d Revert code lost in merge 2018-09-27 12:08:17 -07:00
Eugene Zhulenev
a7a3e9f2b6 Merge with eigen/eigen default 2018-09-27 12:05:06 -07:00
Eugene Zhulenev
9f4988959f Remove explicit mkldnn support and redundant TensorContractionKernelBlocking 2018-09-27 11:49:19 -07:00
Rasmus Munk Larsen
1e5750a5b8 Merged in rmlarsen/eigen4 (pull request PR-511)
Parallelize tensor contraction over the inner dimension.
2018-09-27 17:18:32 +00:00
Gael Guennebaud
af3ad4b513 oops, I've been too fast in previous copy/paste 2018-09-27 09:28:57 +02:00
Gael Guennebaud
24b163a877 #pragma GCC diagnostic push/pop is not supported prioro to gcc 4.6 2018-09-27 09:23:54 +02:00
Eugene Zhulenev
b314376f9c Test mkldnn pack for doubles 2018-09-26 18:22:24 -07:00
Eugene Zhulenev
22ed98a331 Conditionally add mkldnn test 2018-09-26 17:57:37 -07:00
Rasmus Munk Larsen
d956204ab2 Remove "false &&" left over from test. 2018-09-26 17:03:30 -07:00
Rasmus Munk Larsen
3815aeed7a Parallelize tensor contraction over the inner dimension in cases where where one or both of the outer dimensions (m and n) are small but k is large. This speeds up individual matmul microbenchmarks by up to 85%.
Naming below is BM_Matmul_M_K_N_THREADS, measured on a 2-socket Intel Broadwell-based server.

Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_Matmul_1_80_13522_1                  387457    396013     -2.2%
BM_Matmul_1_80_13522_2                  406487    230789    +43.2%
BM_Matmul_1_80_13522_4                  395821    123211    +68.9%
BM_Matmul_1_80_13522_6                  391625     97002    +75.2%
BM_Matmul_1_80_13522_8                  408986    113828    +72.2%
BM_Matmul_1_80_13522_16                 399988     67600    +83.1%
BM_Matmul_1_80_13522_22                 411546     60044    +85.4%
BM_Matmul_1_80_13522_32                 393528     57312    +85.4%
BM_Matmul_1_80_13522_44                 390047     63525    +83.7%
BM_Matmul_1_80_13522_88                 387876     63592    +83.6%
BM_Matmul_1_1500_500_1                  245359    248119     -1.1%
BM_Matmul_1_1500_500_2                  401833    143271    +64.3%
BM_Matmul_1_1500_500_4                  210519    100231    +52.4%
BM_Matmul_1_1500_500_6                  251582     86575    +65.6%
BM_Matmul_1_1500_500_8                  211499     80444    +62.0%
BM_Matmul_3_250_512_1                    70297     68551     +2.5%
BM_Matmul_3_250_512_2                    70141     52450    +25.2%
BM_Matmul_3_250_512_4                    67872     58204    +14.2%
BM_Matmul_3_250_512_6                    71378     63340    +11.3%
BM_Matmul_3_250_512_8                    69595     41652    +40.2%
BM_Matmul_3_250_512_16                   72055     42549    +40.9%
BM_Matmul_3_250_512_22                   70158     54023    +23.0%
BM_Matmul_3_250_512_32                   71541     56042    +21.7%
BM_Matmul_3_250_512_44                   71843     57019    +20.6%
BM_Matmul_3_250_512_88                   69951     54045    +22.7%
BM_Matmul_3_1500_512_1                  369328    374284     -1.4%
BM_Matmul_3_1500_512_2                  428656    223603    +47.8%
BM_Matmul_3_1500_512_4                  205599    139508    +32.1%
BM_Matmul_3_1500_512_6                  214278    139071    +35.1%
BM_Matmul_3_1500_512_8                  184149    142338    +22.7%
BM_Matmul_3_1500_512_16                 156462    156983     -0.3%
BM_Matmul_3_1500_512_22                 163905    158259     +3.4%
BM_Matmul_3_1500_512_32                 155314    157662     -1.5%
BM_Matmul_3_1500_512_44                 235434    158657    +32.6%
BM_Matmul_3_1500_512_88                 156779    160275     -2.2%
BM_Matmul_1500_4_512_1                  363358    349528     +3.8%
BM_Matmul_1500_4_512_2                  303134    263319    +13.1%
BM_Matmul_1500_4_512_4                  176208    130086    +26.2%
BM_Matmul_1500_4_512_6                  148026    115449    +22.0%
BM_Matmul_1500_4_512_8                  131656     98421    +25.2%
BM_Matmul_1500_4_512_16                 134011     82861    +38.2%
BM_Matmul_1500_4_512_22                 134950     85685    +36.5%
BM_Matmul_1500_4_512_32                 133165     90081    +32.4%
BM_Matmul_1500_4_512_44                 133203     90644    +32.0%
BM_Matmul_1500_4_512_88                 134106    100566    +25.0%
BM_Matmul_4_1500_512_1                  439243    435058     +1.0%
BM_Matmul_4_1500_512_2                  451830    257032    +43.1%
BM_Matmul_4_1500_512_4                  276434    164513    +40.5%
BM_Matmul_4_1500_512_6                  182542    144827    +20.7%
BM_Matmul_4_1500_512_8                  179411    166256     +7.3%
BM_Matmul_4_1500_512_16                 158101    155560     +1.6%
BM_Matmul_4_1500_512_22                 152435    155448     -1.9%
BM_Matmul_4_1500_512_32                 155150    149538     +3.6%
BM_Matmul_4_1500_512_44                 193842    149777    +22.7%
BM_Matmul_4_1500_512_88                 149544    154468     -3.3%
2018-09-26 16:47:13 -07:00
Eugene Zhulenev
71cd3fbd6a Support multiple contraction kernel types in TensorContractionThreadPool 2018-09-26 11:08:47 -07:00
Christoph Hertzberg
0a3356f4ec Don't deactivate BVH test for clang (probably, this was failing for very old versions of clang) 2018-09-25 20:26:16 +02:00
Gael Guennebaud
41c3a2ffc1 Fix documentation of reshape to vectors. 2018-09-25 16:35:44 +02:00
Christoph Hertzberg
2c083ace3e Provide EIGEN_OVERRIDE and EIGEN_FINAL macros to mark virtual function overrides 2018-09-24 18:01:17 +02:00
Gael Guennebaud
626942d9dd fix alignment issue in ploaddup for AVX512 2018-09-28 16:57:32 +02:00
Gael Guennebaud
84a1101b36 Merge with default. 2018-09-23 21:52:58 +02:00
Gael Guennebaud
795e12393b Fix logic in diagonal*dense product in a corner case.
The problem was for: diag(1x1) * mat(1,n)
2018-09-22 16:44:33 +02:00
Gael Guennebaud
bac36d0996 Demangle Travseral and Unrolling in Redux 2018-09-21 23:03:45 +02:00
Gael Guennebaud
c696dbcaa6 Fiw shadowing of last and all 2018-09-21 23:02:33 +02:00
Christoph Hertzberg
e3c8289047 Replace unused PREDICATE by corresponding STATIC_ASSERT 2018-09-21 21:15:51 +02:00
Gael Guennebaud
1bf12880ae Add reshaped<>() shortcuts when returning vectors and remove the reshaping version of operator()(all) 2018-09-21 16:50:04 +02:00
Gael Guennebaud
4291f167ee Add missing plugins to DynamicSparseMatrix -- fix sparse_extra_3 2018-09-21 14:53:43 +02:00
Gael Guennebaud
03a0cb2b72 fix unalignedcount for avx512 2018-09-21 14:40:26 +02:00
Gael Guennebaud
371068992a Add more debug output 2018-09-21 14:32:39 +02:00
Gael Guennebaud
91716f03a7 Fix vectorization logic unit test for AVX512 2018-09-21 14:32:24 +02:00
Gael Guennebaud
b00e48a867 Improve slice-vectorization logic for redux (significant speed-up for reduxion of blocks) 2018-09-21 13:45:56 +02:00
Gael Guennebaud
a488d59787 merge with default Eigen 2018-09-21 11:51:49 +02:00
Gael Guennebaud
47720e7970 Doc fixes 2018-09-21 11:48:22 +02:00
Gael Guennebaud
3ec2985914 Merged indexing cleanup (pull request PR-506) 2018-09-21 09:36:05 +00:00
Gael Guennebaud
651e5d4866 Fix EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE for AVX512 or AVX with malloc aligned on 8 bytes only.
This change also make it future proof for AVX1024
2018-09-21 10:33:22 +02:00
Eugene Zhulenev
719e438a20 Collapsed revision
* Split cxx11_tensor_executor test
* Register test parts with EIGEN_SUFFIXES
* Fix EIGEN_SUFFIXES in cxx11_tensor_executor test
2018-09-20 15:19:12 -07:00
Gael Guennebaud
f0ef3467de Fix doc 2018-09-20 22:57:28 +02:00
Gael Guennebaud
617f75f117 Add indexing namespace 2018-09-20 22:57:10 +02:00
Gael Guennebaud
0c56d22e2e Fix shadowing 2018-09-20 22:56:21 +02:00
Rasmus Munk Larsen
8e2be7777e Merged eigen/eigen into default 2018-09-20 11:41:15 -07:00
Rasmus Munk Larsen
5d2e759329 Initialize BlockIteratorState in a C++03 compatible way. 2018-09-20 11:40:43 -07:00
Gael Guennebaud
e04faca930 merge 2018-09-20 18:33:54 +02:00
Gael Guennebaud
d37188b9c1 Fix MPrealSupport 2018-09-20 18:30:10 +02:00
Gael Guennebaud
3c6dc93f99 Fix GPU support. 2018-09-20 18:29:21 +02:00
Gael Guennebaud
e0f6d352fb Rename test/array.cpp to test/array_cwise.cpp to avoid conflicts with the array header. 2018-09-20 18:07:32 +02:00
Gael Guennebaud
eeeb18814f Fix warning 2018-09-20 17:48:56 +02:00
Gael Guennebaud
9419f506d0 Fix regression introduced by the previous fix for AVX512.
It brokes the complex-complex case on SSE.
2018-09-20 17:32:34 +02:00
Christoph Hertzberg
a0166ab651 Workaround for spurious "array subscript is above array bounds" warnings with g++4.x 2018-09-20 17:08:43 +02:00
Gael Guennebaud
e38d1ab4d1 Workaround increases required alignment warning 2018-09-20 17:07:33 +02:00
Christoph Hertzberg
c50250cb24 Avoid warning "suggest braces around initialization of subobject".
This test is not run in C++03 mode, so no compatibility is lost.
2018-09-20 17:03:42 +02:00
Gael Guennebaud
71496b0e25 Fix gebp kernel for real+complex in case only reals are vectorized (e.g., AVX512).
This commit also removes "half-packet" from data-mappers: it was not used and conceptually broken anyways.
2018-09-20 17:01:24 +02:00
Gael Guennebaud
5a30eed17e Fix warnings in AVX512 2018-09-20 16:58:51 +02:00
Gael Guennebaud
2cf6d3050c Disable ignoring attributes warning 2018-09-20 11:38:19 +02:00
Rasmus Munk Larsen
44d8274383 Cast to longer type. 2018-09-19 13:31:42 -07:00
Rasmus Munk Larsen
d638b62dda Silence compiler warning. 2018-09-19 13:27:55 -07:00
Rasmus Munk Larsen
db9c9df59a Silence more compiler warnings. 2018-09-19 11:50:27 -07:00
Rasmus Munk Larsen
febd09dcc0 Silence compiler warnings in ThreadPoolInterface.h. 2018-09-19 11:11:04 -07:00
Gael Guennebaud
c3a19527a2 Fix doc wrt previous change 2018-09-19 11:49:26 +02:00
Gael Guennebaud
dfa8439e4d Update reshaped API to use RowMajor/ColMajor directly as integral values instead of introducing RowOrder/ColOrder types.
The API changed from A.respahed(rows,cols,RowOrder) to A.template reshaped<RowOrder>(rows,cols).
2018-09-19 11:49:26 +02:00
luz.paz"
f67b19a884 [PATCH 1/2] Misc. typos
From 68d431b4c14ad60a778ee93c1f59ecc4b931950e Mon Sep 17 00:00:00 2001
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt` where the whitelists consists of:
```
als
ans
cas
dum
lastr
lowd
nd
overfl
pres
preverse
substraction
te
uint
whch
```
---
 CMakeLists.txt                                | 26 +++++++++----------
 Eigen/src/Core/GenericPacketMath.h            |  2 +-
 Eigen/src/SparseLU/SparseLU.h                 |  2 +-
 bench/bench_norm.cpp                          |  2 +-
 doc/HiPerformance.dox                         |  2 +-
 doc/QuickStartGuide.dox                       |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorChipping.h   |  6 ++---
 .../Eigen/CXX11/src/Tensor/TensorDeviceGpu.h  |  2 +-
 .../src/Tensor/TensorForwardDeclarations.h    |  4 +--
 .../src/Tensor/TensorGpuHipCudaDefines.h      |  2 +-
 .../Eigen/CXX11/src/Tensor/TensorReduction.h  |  2 +-
 .../CXX11/src/Tensor/TensorReductionGpu.h     |  2 +-
 .../test/cxx11_tensor_concatenation.cpp       |  2 +-
 unsupported/test/cxx11_tensor_executor.cpp    |  2 +-
 14 files changed, 29 insertions(+), 29 deletions(-)
2018-09-18 04:15:01 -04:00
Gael Guennebaud
297ca62319 ease transition by adding placeholders::all/last/and as deprecated 2018-09-17 16:24:52 +02:00
Gael Guennebaud
2014c7ae28 Move all, last, end from Eigen::placeholders namespace to Eigen::, and rename end to lastp1 to avoid conflicts with std::end. 2018-09-15 14:35:10 +02:00
Gael Guennebaud
82772e8d9d Rename Symbolic namespace to symbolic to be consistent with numext namespace 2018-09-15 14:16:20 +02:00
Rasmus Munk Larsen
400512bfad Merged in ezhulenev/eigen-02 (pull request PR-501)
Enable DSizes type promotion with c++03
2018-09-19 00:50:04 +00:00
Eugene Zhulenev
c4627039ac Support static dimensions (aka IndexList) in Tensor::resize(...) 2018-09-18 14:25:21 -07:00
Gael Guennebaud
3e8188fc77 bug #1600: initialize m_info to InvalidInput by default, even though m_info is not accessible until it has been initialized (assert) 2018-09-18 21:24:48 +02:00
Eugene Zhulenev
218a7b9840 Enable DSizes type promotion with c++03 compilers 2018-09-18 10:57:00 -07:00
Ravi Kiran
1f0c941c3d Collapsed revision
* Merged eigen/eigen into default
2018-09-17 18:29:12 -07:00
Rasmus Munk Larsen
03a88c57e1 Merged in ezhulenev/eigen-02 (pull request PR-498)
Add DSizes index type promotion
2018-09-17 21:58:38 +00:00
Rasmus Munk Larsen
5ca0e4a245 Merged in ezhulenev/eigen-01 (pull request PR-497)
Fix warnings in IndexList array_prod
2018-09-17 20:15:06 +00:00
Eugene Zhulenev
a5cd4e9ad1 Replace deprecated Eigen::DenseIndex with Eigen::Index in TensorIndexList 2018-09-17 10:58:07 -07:00
Gael Guennebaud
b311bfb752 bug #1596: fix inclusion of Eigen's header within unsupported modules. 2018-09-17 09:54:29 +02:00
Gael Guennebaud
72f19c827a typo 2018-09-16 22:10:34 +02:00
Eugene Zhulenev
66f056776f Add DSizes index type promotion 2018-09-15 15:17:38 -07:00
Eugene Zhulenev
f313126dab Fix warnings in IndexList array_prod 2018-09-15 13:47:54 -07:00
Christoph Hertzberg
42705ba574 Fix weird error for building with g++-4.7 in C++03 mode. 2018-09-15 12:43:41 +02:00
Rasmus Munk Larsen
c2383f95af Merged in ezhulenev/eigen/fix_dsizes (pull request PR-494)
Fix DSizes IndexList constructor
2018-09-15 02:36:19 +00:00
Rasmus Munk Larsen
30290cdd56 Merged in ezhulenev/eigen/moar_eigen_fixes_3 (pull request PR-493)
Const cast scalar pointer in TensorSlicingOp evaluator

Approved-by: Sameer Agarwal <sameeragarwal@google.com>
2018-09-15 02:35:07 +00:00
Eugene Zhulenev
f7d0053cf0 Fix DSizes IndexList constructor 2018-09-14 19:19:13 -07:00
Rasmus Munk Larsen
601e289d27 Merged in ezhulenev/eigen/moar_eigen_fixes_1 (pull request PR-492)
Explicitly construct tensor block dimensions from evaluator dimensions
2018-09-15 01:36:21 +00:00
Eugene Zhulenev
71070a1e84 Const cast scalar pointer in TensorSlicingOp evaluator 2018-09-14 17:17:50 -07:00
Eugene Zhulenev
4863375723 Explicitly construct tensor block dimensions from evaluator dimensions 2018-09-14 16:55:05 -07:00
Rasmus Munk Larsen
14e35855e1 Merged in chtz/eigen-maxsizevector (pull request PR-490)
Let MaxSizeVector respect alignment of objects

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-09-14 23:29:24 +00:00
Rasmus Munk Larsen
281e631839 Merged in ezhulenev/eigen/indexlist_to_dsize (pull request PR-491)
Support reshaping with static shapes and dimensions conversion in tensor broadcasting

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
2018-09-14 22:45:52 +00:00
Eugene Zhulenev
1b8d70a22b Support reshaping with static shapes and dimensions conversion in tensor broadcasting 2018-09-14 15:25:27 -07:00
Christoph Hertzberg
007f165c69 bug #1598: Let MaxSizeVector respect alignment of objects and add a unit test
Also revert 8b3d9ed081
2018-09-14 20:21:56 +02:00
Christoph Hertzberg
d7378aae8e Provide EIGEN_ALIGNOF macro, and give handmade_aligned_malloc the possibility for alignments larger than the standard alignment. 2018-09-14 20:17:47 +02:00
Rasmus Munk Larsen
9b864cdb37 Merged in rmlarsen/eigen3 (pull request PR-480)
Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set.
2018-09-14 00:05:09 +00:00
Rasmus Munk Larsen
d0eef5fe6c Don't use bracket syntax in ctor. 2018-09-13 17:04:05 -07:00
Rasmus Munk Larsen
6313dde390 Fix merge error. 2018-09-13 16:42:05 -07:00
Rasmus Munk Larsen
0db590d22d Backed out changeset 01197e4452 2018-09-13 16:20:57 -07:00
Rasmus Munk Larsen
b3f4c067d9 Merge 2018-09-13 16:18:52 -07:00
Rasmus Munk Larsen
2b07018140 Enable vectorized version on GPUs. The underlying bug has been fixed. 2018-09-13 16:12:22 -07:00
Rasmus Munk Larsen
53568e3549 Merged in ezhulenev/eigen/tiled_evalution_support (pull request PR-444)
Tiled evaluation for Tensor ops

Approved-by: Rasmus Munk Larsen <rmlarsen@google.com>
Approved-by: Gael Guennebaud <g.gael@free.fr>
2018-09-13 22:05:47 +00:00
Eugene Zhulenev
01197e4452 Fix warnings 2018-09-13 15:03:36 -07:00
Gael Guennebaud
1141bcf794 Fix conjugate-gradient for very small rhs 2018-09-13 23:53:28 +02:00
Gael Guennebaud
7f3b17e403 MSVC 2015 supports c++11 thread-local-storage 2018-09-13 18:15:07 +02:00
Eugene Zhulenev
d138fe341d Fis static_assert in test to conform c++11 standard 2018-09-11 17:23:18 -07:00
Rasmus Munk Larsen
e289f44c56 Don't vectorize the MeanReducer unless pdiv is available. 2018-09-11 14:09:00 -07:00
Eugene Zhulenev
55bb7e7935 Merge with upstream eigen/default 2018-09-11 13:33:06 -07:00
Eugene Zhulenev
81b38a155a Fix compilation of tiled evaluation code with c++03 2018-09-11 13:32:32 -07:00
Rasmus Munk Larsen
5da960702f Merged eigen/eigen into default 2018-09-11 10:08:46 -07:00
Rasmus Munk Larsen
46f88fc454 Use numerically stable tree reduction in TensorReduction. 2018-09-11 10:08:10 -07:00
Justin Carpentier
4827bec776 LLT: correct doc and add missing reference for the return type of rankUpdate
---
 Eigen/src/Cholesky/LLT.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2018-09-11 09:33:21 +02:00
Rasmus Munk Larsen
3d057e0453 Avoid compilation error in C++11 test when EIGEN_AVOID_STL_ARRAY is set. 2018-09-06 12:59:36 -07:00
cgs1019
c6066ac411 Make param name and docs constistent for JacobiRotation::makeGivens
Previously the rendered math in the doc string called the optional return value
'r', while the actual parameter and the doc string text referred to the
parameter as 'z'. This changeset renames all the z's to r's to match the math.
2018-09-06 11:04:17 -04:00
Alexey Frunze
edeee16a16 Fix build failures in matrix_power and matrix_exponential tests.
This fixes the static assertion complaining about double being
used in place of long double. This happened on MIPS32, where
double and long double have the same type representation.
This can be simulated on x86 as well if we pass -mlong-double-64
to g++.
2018-08-31 14:11:10 -07:00
Deven Desai
c64fe9ea1f Updates to fix HIP-clang specific compile errors.
Compiling the eigen unittests with hip-clang (HIP with clang as the underlying compiler instead of hcc or nvcc), results in compile errors. The changes in this commit fix those compile errors. The main change is to convert a few instances of "__device__" to "EIGEN_DEVICE_FUNC"
2018-08-30 20:22:16 +00:00
Rasmus Munk Larsen
8b3d9ed081 Use padding instead of alignment attribute, which MaxSizeVector does not respect. This leads to undefined behavior and hard-to-trace bugs. 2018-09-05 11:20:06 -07:00
Gael Guennebaud
5927eef612 Enable std::result_of for msvc 2015 and later 2018-09-13 09:44:46 +02:00
Christoph Hertzberg
3adece4827 Fix misleading indentation of errorCode and make it loop-local 2018-09-12 14:41:38 +02:00
Christoph Hertzberg
7e9c9fbb2d Disable type-limits warnings for g++ < 4.8 2018-09-12 14:40:39 +02:00
Christoph Hertzberg
ba2c8efdcf EIGEN_UNUSED is not supported by g++4.7 (and not portable) 2018-09-12 11:49:10 +02:00
Christoph Hertzberg
ff4e835d6b "sparse_product.cpp" must be included before "sparse_basic.cpp", otherwise EIGEN_SPARSE_CREATE_TEMPORARY_PLUGIN has no effect 2018-08-30 20:10:11 +02:00
Christoph Hertzberg
023ed6b9a8 Product of empty array must be 1 and not 0. 2018-08-30 17:14:52 +02:00
Christoph Hertzberg
c2f4e8c08e Fix integer conversion warning 2018-08-30 17:12:53 +02:00
Christoph Hertzberg
ddbc564386 Fixed a few more shadowing warnings when compiling with g++ (and c++03) 2018-08-30 16:33:03 +02:00
Deven Desai
946c3e2544 adding EIGEN_DEVICE_FUNC attribute to fix some GPU unit tests that are broken in HIP mode 2018-08-27 23:04:08 +00:00
Mehdi Goli
7ec8b40ad9 Collapsed revision
* Separating SYCL math function.
* Converting function overload to function specialisation.
* Applying the suggested design.
2018-08-28 14:20:48 +01:00
Christoph Hertzberg
20ba2eee6d gcc thinks this may not be initialized 2018-08-28 18:33:24 +02:00
Christoph Hertzberg
73ca600bca Fix numerous shadow-warnings for GCC<=4.8 2018-08-28 18:32:39 +02:00
Christoph Hertzberg
ef4d79fed8 Disable/ReenableStupidWarnings did not work properly, when included recursively 2018-08-28 18:26:22 +02:00
Gael Guennebaud
befaf83f5f bug #1590: fix collision with some system headers defining the macro FP32 2018-08-28 13:21:28 +02:00
Christoph Hertzberg
42f3ee4fb8 Old gcc versions have problems with recursive #pragma GCC diagnostic push/pop
Workaround: Don't include "DisableStupidWarnings.h" before including other main-headers
2018-08-28 11:44:15 +02:00
Eugene Zhulenev
c144bb355b Merge with upstream eigen/default 2018-08-27 14:34:07 -07:00
Gael Guennebaud
5747288676 Disable a bonus unit-test which is broken with gcc 4.7 2018-08-27 13:07:34 +02:00
Gael Guennebaud
d5ed64512f bug #1573: workaround gcc 4.7 and 4.8 bug 2018-08-27 10:38:20 +02:00
Christoph Hertzberg
b1653d1599 Fix some trivial C++11 vs C++03 compatibility warnings 2018-08-25 12:21:00 +02:00
Christoph Hertzberg
42123ff38b Make unit test C++03 compatible 2018-08-25 11:53:28 +02:00
Christoph Hertzberg
4b1ad086b5 Fix shadow warnings in doc-snippets 2018-08-25 10:07:17 +02:00
Christoph Hertzberg
117bc5d505 Fix some shadow warnings 2018-08-25 09:06:08 +02:00
Christoph Hertzberg
f155e97adb Previous fix broke compilation for clang 2018-08-25 00:10:46 +02:00
Christoph Hertzberg
209b4972ec Fix conversion warning 2018-08-25 00:02:46 +02:00
Christoph Hertzberg
495f6c3c3a Fix missing-braces warnings 2018-08-24 23:56:13 +02:00
Christoph Hertzberg
5aaedbeced Fixed more sign-compare and type-limits warnings 2018-08-24 23:54:12 +02:00
Christoph Hertzberg
8295f02b36 Hide "maybe uninitialized" warning on gcc 2018-08-24 23:22:20 +02:00
Christoph Hertzberg
f7675b826b Fix several integer conversion and sign-compare warnings 2018-08-24 22:58:55 +02:00
Christoph Hertzberg
949b0ad9cb Merged in rmlarsen/eigen3 (pull request PR-468)
Add support for emulating thread local.
2018-08-24 17:29:03 +00:00
Rasmus Munk Larsen
744e2fe0de Address comments about EIGEN_THREAD_LOCAL. 2018-08-24 10:24:54 -07:00
Christoph Hertzberg
ad4a08fb68 Use Intel cast intrinsics, since MSVC does not allow direct casting.
Reported by David Winkler.
2018-08-24 19:04:33 +02:00
Rasmus Munk Larsen
8d9bc5cc02 Fix g++ compilation. 2018-08-23 13:06:39 -07:00
Rasmus Munk Larsen
e9f9d70611 Don't rely on __had_feature for g++.
Don't use __thread.
Only use thread_local for gcc 4.8 or newer.
2018-08-23 12:59:46 -07:00
Rasmus Munk Larsen
668690978f Pad PerThread when we emulate thread_local to prevent false sharing. 2018-08-23 12:54:33 -07:00
Rasmus Munk Larsen
6cedc5a9b3 rename mu. 2018-08-23 12:11:58 -07:00
Rasmus Munk Larsen
6e0464004a Store std::unique_ptr instead of raw pointers in per_thread_map_. 2018-08-23 12:10:08 -07:00
Rasmus Munk Larsen
e51d9e473a Protect #undef max with #ifdef max. 2018-08-23 11:42:05 -07:00
Rasmus Munk Larsen
d35880ed91 merge 2018-08-23 11:36:49 -07:00
Christoph Hertzberg
a709c8efb4 Replace pointers by values or unique_ptr for better leak-safety 2018-08-23 19:41:59 +02:00
Christoph Hertzberg
39335cf51e Make MaxSizeVector leak-safe 2018-08-23 19:37:56 +02:00
Benoit Steiner
ff8e0ecc2f Updated one more line of code to avoid making the test dependent on cxx11 features. 2018-08-17 15:15:52 -07:00
Benoit Steiner
43d9dd9b28 Removed more dependencies on cxx11. 2018-08-17 08:49:32 -07:00
Gael Guennebaud
f76c802973 Add missing empty line 2018-08-17 17:16:12 +02:00
Christoph Hertzberg
41f1cc67b8 Assertion depended on a not yet initialized value 2018-08-17 16:42:53 +02:00
Christoph Hertzberg
4713465eef Silence double-promotion warning 2018-08-17 16:39:43 +02:00
Christoph Hertzberg
595cae9b09 Silence logical-op-parentheses warning 2018-08-17 16:30:32 +02:00
Christoph Hertzberg
c9b25fbefa Silence unused parameter warning 2018-08-17 16:28:28 +02:00
Christoph Hertzberg
dbdeceabdd Silence double-promotion warning (when converting double to complex<long double>) 2018-08-17 16:26:11 +02:00
Benoit Steiner
19df4d5752 Merged in codeplaysoftware/eigen-upstream-pure/Pointer_type_creation (pull request PR-461)
Creating a pointer type in TensorCustomOp.h
2018-08-16 18:28:33 +00:00
Benoit Steiner
f641cf1253 Adding missing at method in Eigen::array 2018-08-16 11:24:37 -07:00
Benoit Steiner
ede580ccda Avoid using the auto keyword to make the tensor block access test more portable 2018-08-16 10:49:47 -07:00
Benoit Steiner
e23c8c294e Use actual types instead of the auto keyword to make the code more portable 2018-08-16 10:41:01 -07:00
Mehdi Goli
80f1a76dec removing the noises. 2018-08-16 13:33:24 +01:00
Mehdi Goli
d0b01ebbf6 Reverting the unitended delete from the code. 2018-08-16 13:21:36 +01:00
Mehdi Goli
161dcbae9b Using PointerType struct and specializing it per device for TensorCustomOp.h 2018-08-16 00:07:02 +01:00
Sameer Agarwal
f197c3f55b Removed an used variable (PacketSize) from TensorExecutor 2018-08-15 11:24:57 -07:00
Benoit Steiner
4181556907 Fixed the tensor contraction code. 2018-08-15 09:34:47 -07:00
Benoit Steiner
b6f96cf7dd Removed dependencies on cxx11 language features from the tensor_block_access test 2018-08-15 08:54:31 -07:00
Benoit Steiner
fbb834144d Fixed more compilation errors 2018-08-15 08:52:58 -07:00
Benoit Steiner
6bb3f1b43e Made the tensor_block_access test compile again 2018-08-14 14:26:59 -07:00
Benoit Steiner
43ec0082a6 Made the kronecker_product test compile again 2018-08-14 14:08:36 -07:00
Benoit Steiner
ab3f481141 Cleaned up the code and make it compile with more compilers 2018-08-14 14:05:46 -07:00
Rasmus Munk Larsen
fa0bcbf230 merge 2018-08-14 12:18:31 -07:00
Rasmus Munk Larsen
15d4f515e2 Use plain_assert in destructors to avoid throwing in CXX11 tests where main.h owerwrites eigen_assert with a throwing version. 2018-08-14 12:17:46 -07:00
Rasmus Munk Larsen
aebdb06424 Fix a few compiler warnings in CXX11 tests. 2018-08-14 12:06:39 -07:00
Rasmus Munk Larsen
2a98bd9c8e Merged eigen/eigen into default 2018-08-14 12:02:09 -07:00
Benoit Steiner
59bba77ead Fixed compilation errors with gcc 4.7 and 4.8 2018-08-14 10:54:48 -07:00
Mehdi Goli
a97aaa2bcf Merge with upstream. 2018-08-14 17:49:29 +01:00
Mehdi Goli
8ba799805b Merge with upstream 2018-08-14 09:43:45 +01:00
Rasmus Munk Larsen
6d6e7b7027 merge 2018-08-13 15:34:50 -07:00
Rasmus Munk Larsen
9bb75d8d31 Add Barrier.h. 2018-08-13 15:34:03 -07:00
Rasmus Munk Larsen
2e1adc0324 Merged eigen/eigen into default 2018-08-13 15:32:00 -07:00
Rasmus Munk Larsen
8278ae6313 Add support for thread local support on platforms that do not support it through emulation using a hash map. 2018-08-13 15:31:23 -07:00
Benoit Steiner
501be70b27 Code cleanup 2018-08-13 15:16:40 -07:00
Benoit Steiner
3d3711f22f Fixed compilation errors. 2018-08-13 15:16:06 -07:00
Gael Guennebaud
3ec60215df Merged in rmlarsen/eigen2 (pull request PR-466)
Move sigmoid functor to core and rename it to 'logistic'.
2018-08-13 21:28:20 +00:00
Rasmus Munk Larsen
0f1b2e08a5 Call logistic functor from Tensor::sigmoid. 2018-08-13 11:52:58 -07:00
Rasmus Munk Larsen
d6e283ba96 sigmoid -> logistic 2018-08-13 11:14:50 -07:00
Benoit Steiner
26239ee580 Use NULL instead of nullptr to avoid adding a cxx11 requirement. 2018-08-13 11:05:51 -07:00
Benoit Steiner
3810ec228f Don't use the auto keyword since it's not always supported properly. 2018-08-13 10:46:09 -07:00
Benoit Steiner
e6d5be811d Fixed syntax of nested templates chevrons to make it compatible with c++97 mode. 2018-08-13 10:29:21 -07:00
Mehdi Goli
1aa86aad14 Merge with upstream. 2018-08-13 15:40:31 +01:00
Eugene Zhulenev
35d90e8960 Fix BlockAccess enum in CwiseUnaryOp evaluator 2018-08-10 17:37:58 -07:00
Eugene Zhulenev
855b68896b Merge with eigen/default 2018-08-10 17:18:42 -07:00
Eugene Zhulenev
f2209d06e4 Add block evaluationto CwiseUnaryOp and add PreferBlockAccess enum to all evaluators 2018-08-10 16:53:36 -07:00
Benoit Steiner
c8ea398675 Avoided language features that are only available in cxx11 mode. 2018-08-10 13:02:41 -07:00
Benoit Steiner
4be4286224 Made the code compile with gcc 5.4. 2018-08-10 11:32:58 -07:00
Justin Carpentier
eabc7a4031 PR 465: Fix issue in RowMajor assignment in plain_matrix_type_row_major::type
The type should be RowMajor
2018-08-10 14:30:06 +02:00
Rasmus Munk Larsen
c49e93440f SuiteSparse defines the macro SuiteSparse_long to control what type is used for 64bit integers. The default value of this macro on non-MSVC platforms is long and __int64 on MSVC. CholmodSupport defaults to using long for the long variants of CHOLMOD functions. This creates problems when SuiteSparse_long is different than long. So the correct thing to do here is
to use SuiteSparse_long as the type instead of long.
2018-08-13 15:53:31 -07:00
Mehdi Goli
3a2e1b1fc6 Merge with upstream. 2018-08-10 12:28:38 +01:00
Rasmus Munk Larsen
bfc5091dd5 Cast to diagonalSize to RealScalar instead Scalar. 2018-08-09 14:46:17 -07:00
Rasmus Munk Larsen
8603d80029 Cast diagonalSize() to Scalar before multiplication. Without this, automatic differentiation in Ceres breaks because Scalar is a custom type that does not support multiplication by Index. 2018-08-09 11:09:10 -07:00
Eugene Zhulenev
cfaedb38cd Fix bug in a test + compilation errors 2018-08-09 09:44:07 -07:00
Mehdi Goli
ea8fa5e86f Merge with upstream 2018-08-09 14:07:56 +01:00
Mehdi Goli
8c083bfd0e Properly fixing the PointerType for TensorCustomOp.h. As the output type here should be based on CoeffreturnType not the Scalar type. Therefore, Similar to reduction and evalTo function, it should have its own MakePointer class. In this case, for other device the type is defaulted to CoeffReturnType and no changes is required on users' code. However, in SYCL, on the device, we can recunstruct the device Type. 2018-08-09 13:57:43 +01:00
Alexey Frunze
050bcf6126 bug #1584: Improve random (avoid undefined behavior). 2018-08-08 20:19:32 -07:00
Eugene Zhulenev
1c8b9e10a7 Merged with upstream eigen 2018-08-08 16:57:58 -07:00
Benoit Steiner
131ed1191f Merged in codeplaysoftware/eigen-upstream-pure/Fixing_compiler_warning (pull request PR-462)
Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation.
2018-08-08 18:14:15 +00:00
Benoit Steiner
1285c080b3 Merged in codeplaysoftware/eigen-upstream-pure/disabling_assert_in_sycl (pull request PR-459)
Disabling assert inside SYCL kernel.
2018-08-08 18:12:42 +00:00
Benoit Steiner
c4b2845be9 Merged in rmlarsen/eigen3 (pull request PR-458)
Fix init order.
2018-08-08 18:11:49 +00:00
Benoit Steiner
7124172b83 Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_UNROLL_LOOP (pull request PR-460)
Adding EIGEN_UNROLL_LOOP macro.
2018-08-08 18:10:54 +00:00
Mehdi Goli
532a0be05c Fixing compiler warning in TensorBlock.h as it was creating a lot of noise at compilation. 2018-08-08 12:12:26 +01:00
Mehdi Goli
67711eaa31 Fixing typo. 2018-08-08 11:38:10 +01:00
Mehdi Goli
3055e3a7c2 Creating a pointer type in TensorCustomOp.h 2018-08-08 11:19:02 +01:00
Mehdi Goli
22031ab59a Adding EIGEN_UNROLL_LOOP macro. 2018-08-08 11:07:27 +01:00
Mehdi Goli
908b906d79 Disabling assert inside SYCL kernel. 2018-08-08 10:01:10 +01:00
Rasmus Munk Larsen
693fb1d41e Fix init order. 2018-08-07 17:18:51 -07:00
Benoit Steiner
10d286f55b Silenced a couple of compilation warnings. 2018-08-06 16:00:29 -07:00
Benoit Steiner
d011d05fd6 Fixed compilation errors. 2018-08-06 13:40:51 -07:00
Rasmus Munk Larsen
36e7e7dd8f Forward declare NoOpOutputKernel as struct rather than class to be consistent with implementation. 2018-08-06 13:16:32 -07:00
Rasmus Munk Larsen
fa68342ef8 Move sigmoid functor to core. 2018-08-03 17:31:23 -07:00
Gael Guennebaud
09c81ac033 bug #1451: fix numeric_limits<AutoDiffScalar<Der>> with a reference as derivative type 2018-08-04 00:17:37 +02:00
luz.paz"
43fd42a33b Fix doxy and misc. typos
Found via `codespell -q 3 -I ../eigen-word-whitelist.txt`
---
 Eigen/src/Core/ProductEvaluators.h |  4 ++--
 Eigen/src/Core/arch/GPU/Half.h     |  2 +-
 Eigen/src/Core/util/Memory.h       |  2 +-
 Eigen/src/Geometry/Hyperplane.h    |  2 +-
 Eigen/src/Geometry/Transform.h     |  2 +-
 Eigen/src/Geometry/Translation.h   | 12 ++++++------
 doc/PreprocessorDirectives.dox     |  2 +-
 doc/TutorialGeometry.dox           |  2 +-
 test/boostmultiprec.cpp            |  2 +-
 test/triangular.cpp                |  2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)
2018-08-01 21:34:47 -04:00
Jean-Christophe Fillion-Robin
2cbd9dd498 [PATCH] cmake: Support source include with add_subdirectory and
find_package use
This commit allows the sources of the project to be included in a parent
project CMakeLists.txt and support use of "find_package(Eigen3 CONFIG REQUIRED)"

Here is an example allowing to test the changes. It is not particularly
useful in itself. This change will allow to support one of the scenario
allowing to create custom 3D Slicer application bundling associated plugins.

/tmp/eigen-git-mirror  # Eigen sources

/tmp/test/CMakeLists.txt:

  cmake_minimum_required(VERSION 3.12)
  project(test)
  add_subdirectory("/tmp/eigen-git-mirror" "eigen-git-mirror")
  find_package(Eigen3 CONFIG REQUIRED)

and configuring it using:

  mkdir /tmp/test-build && cd $_
  cmake \
    -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY:BOOL=1 \
    -DEigen3_DIR:PATH=/tmp/test-build/eigen-git-mirror \
    /tmp/test

Co-authored-by: Pablo Hernandez <pablo.hernandez@kitware.com>
---
 CMakeLists.txt              | 1 +
 cmake/Eigen3Config.cmake.in | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)
2018-09-07 15:50:19 -04:00
Christoph Hertzberg
a80a290079 Fix 'template argument uses local type'-warnings (when compiled in C++03 mode) 2018-09-10 18:57:28 +02:00
Jiandong Ruan
6dcd2642aa bug #1526 - CUDA compilation fails on CUDA 9.x SDK when arch is set to compute_60 and/or above 2018-09-08 12:05:33 -07:00
Christoph Hertzberg
edfb7962fd Use static const int instead of enum to avoid numerous local-type-template-args warnings in C++03 mode 2018-09-07 14:08:39 +02:00
Alexey Frunze
ec38f07b79 bug #1595: Don't use C++11's std::isnan() in MIPS/MSA packet math.
This removes reliance on C++11 and improves generated code.
2018-09-06 15:40:09 -07:00
Eugene Zhulenev
1b0373ae10 Replace all using declarations with typedefs in Tensor ops 2018-08-01 15:55:46 -07:00
Rasmus Munk Larsen
7f8b53fd0e bug #1580: Fix cuda clang build. STL is not supported, so std::equal_to and std::not_equal breaks compilation.
Update the definition of EIGEN_CONSTEXPR_ARE_DEVICE_FUNC to exclude clang.
See also PR 450.
2018-08-01 12:36:24 -07:00
Rasmus Munk Larsen
bcb29f890c Fix initialization order. 2018-08-03 10:18:53 -07:00
Benoit Steiner
cf17794ef4 Merged in codeplaysoftware/eigen-upstream-pure/SYCL-required-changes (pull request PR-454)
SYCL required changes
2018-08-03 16:17:30 +00:00
Mehdi Goli
3074b1ff9e Fixing the compilation error. 2018-08-03 17:13:44 +01:00
Mehdi Goli
225fa112aa Merge with upstream. 2018-08-03 17:04:08 +01:00
Mehdi Goli
01358300d5 Creating separate SYCL required PR for uncontroversial files. 2018-08-03 16:59:15 +01:00
Gustavo Lima Chaves
2bf1cc8cf7 Fix 256 bit packet size assumptions in unit tests.
Like in change 2606abed53
, we have hit the threshould again. With
AVX512 builds we would never have Vector8f packets aligned at 64
bytes (the new value of EIGEN_MAX_ALIGN_BYTES after change 405859f18d
,
for AVX512-enabled builds).

This makes test/dynalloc.cpp pass for those builds.
2018-08-02 15:55:36 -07:00
Benoit Steiner
dd5875e30d Merged in codeplaysoftware/eigen-upstream-pure/constructor_error_clang (pull request PR-451)
Fixing ambigous constructor error for Clang compiler.
2018-08-02 20:46:03 +00:00
Benoit Steiner
113d8343d6 Merged in codeplaysoftware/eigen-upstream-pure/Fixing_visual_studio_error_For_tensor_trace (pull request PR-452)
Fixing compilation error for cxx11_tensor_trace.cpp on Microsoft Visual Studio.
2018-08-02 17:54:26 +00:00
Mehdi Goli
516d2621b9 fixing compilation error for cxx11_tensor_trace.cpp error on Microsoft Visual Studio. 2018-08-02 14:30:48 +01:00
Mehdi Goli
40d6d020a0 Fixing ambigous constructor error for Clang compiler. 2018-08-02 13:34:53 +01:00
Gael Guennebaud
62169419ab Fix two regressions introduced in previous merges: bad usage of EIGEN_HAS_VARIADIC_TEMPLATES and linking issue. 2018-08-01 23:35:34 +02:00
Eugene Zhulenev
64abdf1d7e Fix typo + get rid of redundant member variables for block sizes 2018-08-01 12:35:19 -07:00
Benoit Steiner
93b9e36e10 Merged in paultucker/eigen (pull request PR-431)
Optional ThreadPoolDevice allocator

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-08-01 19:14:34 +00:00
Eugene Zhulenev
385b3ff12f Merged latest changes from upstream/eigen 2018-08-01 11:59:04 -07:00
Benoit Steiner
17221115c9 Merged in codeplaysoftware/eigen-upstream-pure/eigen_variadic_assert (pull request PR-447)
Adding variadic version of assert which can take a parameter pack as its input.
2018-08-01 16:41:54 +00:00
Benoit Steiner
0360c36170 Merged in codeplaysoftware/eigen-upstream-pure/separating_internal_memory_allocation (pull request PR-446)
Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation.
2018-08-01 16:13:15 +00:00
Mehdi Goli
c6a5c70712 Correcting the position of allocate_temp/deallocate_temp in TensorDeviceGpu.h 2018-08-01 16:56:26 +01:00
Benoit Steiner
9ca1c09131 Merged in codeplaysoftware/eigen-upstream-pure/new-arch-SYCL-headers (pull request PR-448)
Adding new arch/SYCL headers, used for SYCL vectorization.
2018-08-01 15:50:54 +00:00
Benoit Steiner
45f75f1ace Merged in codeplaysoftware/eigen-upstream-pure/using_PacketType_class (pull request PR-449)
Enabling per device specialisation of packetSize.
2018-08-01 15:43:03 +00:00
Benoit Steiner
90e632fd66 Merged in codeplaysoftware/eigen-upstream-pure/EIGEN_STRONG_INLINE_MACRO (pull request PR-445)
Replacing ad-hoc inline keyword with EIGEN_STRONG_INLINE MACRO.
2018-08-01 15:41:06 +00:00
Mehdi Goli
af96018b49 Using the suggested modification. 2018-08-01 16:04:44 +01:00
Mehdi Goli
b512a9536f Enabling per device specialisation of packetsize. 2018-08-01 13:39:13 +01:00
Mehdi Goli
c84509d7cc Adding new arch/SYCL headers, used for SYCL vectorization. 2018-08-01 12:40:54 +01:00
Mehdi Goli
3a197a60e6 variadic version of assert which can take a parameter pack as its input. 2018-08-01 12:19:14 +01:00
Mehdi Goli
d7a8414848 Distinguishing between internal memory allocation/deallocation from explicit user memory allocation/deallocation. 2018-08-01 11:56:30 +01:00
Mehdi Goli
9e219bb3d3 Converting ad-hoc inline keyword to EIGEN_STRONG_INLINE MACRO. 2018-08-01 10:47:49 +01:00
Eugene Zhulenev
83c0a16baf Add block evaluation support to TensorOps 2018-07-31 15:56:31 -07:00
Benoit Steiner
edf46bd7a2 Merged in yuefengz/eigen (pull request PR-370)
Use device's allocate function instead of internal::aligned_malloc.
2018-07-31 22:38:28 +00:00
Paul Tucker
385f7b8d0c Change getAllocator() to allocator() in ThreadPoolDevice. 2018-07-31 13:52:18 -07:00
Mark D Ryan
6f5b126e6d Fix tensor contraction for AVX512 machines
This patch modifies the TensorContraction class to ensure that the kc_ field is
always a multiple of the packet_size, if the packet_size is > 8.  Without this
change spatial convolutions in Tensorflow do not work properly as the code that
re-arranges the input matrices can assert if kc_ is not a multiple of the
packet_size.  This leads to a unit test failure,
//tensorflow/python/kernel_tests:conv_ops_test, on AVX512 builds of tensorflow.
2018-07-31 09:33:37 +01:00
Gael Guennebaud
d6568425f8 Close branch tiling_3. 2018-07-31 08:13:43 +00:00
Gael Guennebaud
678a0dcb12 Merged in ezhulenev/eigen/tiling_3 (pull request PR-438)
Tiled tensor executor
2018-07-31 08:13:00 +00:00
Gael Guennebaud
679eece876 Speedup trivial tensor broadcasting on GPU by enforcing unaligned loads. See PR 437. 2018-07-31 10:10:14 +02:00
Gael Guennebaud
723856dec1 bug #1577: fix msvc compilation of unit test, msvc defines ptrdiff_t as long long 2018-07-30 14:52:15 +02:00
Eugene Zhulenev
966c2a7bb6 Rename Index to StorageIndex + use Eigen::Array and Eigen::Map when possible 2018-07-27 12:45:17 -07:00
Eugene Zhulenev
6913221c43 Add tiled evaluation support to TensorExecutor 2018-07-25 13:51:10 -07:00
Alexey Frunze
7b91c11207 bug #1578: Improve prefetching in matrix multiplication on MIPS. 2018-07-24 18:36:44 -07:00
Patrik Huber
f5cace5e9f Fix two small typos in the documentation 2018-07-26 19:55:19 +00:00
Gael Guennebaud
34539c4af4 Merged in rmlarsen/eigen1 (pull request PR-441)
Reduce the number of template specializations of classes related to tensor contraction to reduce binary size.
2018-07-30 11:26:24 +00:00
Mark D Ryan
bc615e4585 Re-enable FMA for fast sqrt functions 2018-07-30 13:21:00 +02:00
Mark D Ryan
96b030a8e4 Re-enable FMA for fast sqrt functions
This commit re-enables the use of FMA for the FAST sqrt functions.
Doing so improves the performance of both algorithms.  The float32
version is now 88% the speed of the original function, while the
double version is 90%.
2018-07-30 10:19:51 +01:00
Rasmus Munk Larsen
e478532625 Reduce the number of template specializations of classes related to tensor contraction to reduce binary size. 2018-07-27 12:36:34 -07:00
Rasmus Munk Larsen
2ebcb911b2 Add pcast packet op for NEON. 2018-07-26 14:28:48 -07:00
Christoph Hertzberg
397b0547e1 DIsable static assertions only when necessary and disable double-promotion warnings in that case as well 2018-07-26 00:01:24 +02:00
Christoph Hertzberg
5e79402b4a fix warnings for doc-eigen-prerequisites 2018-07-24 21:59:15 +02:00
Christoph Hertzberg
5f79b7f9a9 Removed several shadowing types and use global Index typedef everywhere 2018-07-25 21:47:45 +02:00
Christoph Hertzberg
44ee201337 Rename variable which shadows class name 2018-07-25 20:26:15 +02:00
Gustavo Lima Chaves
705f66a9ca Account for missing change on commit "Remove SimpleThreadPool and..."
"... always use {NonBlocking}ThreadPool". It seems the non-blocking
implementation was me the default/only one, but a reference to the old
name was left unmodified. Fix that.
2018-07-23 16:29:09 -07:00
Christoph Hertzberg
fd4fe7cbc5 Fixed issue which made documentation not getting built anymore 2018-07-24 22:56:15 +02:00
Christoph Hertzberg
636126ef40 Allow to filter out build-error messages 2018-07-24 20:12:49 +02:00
Eugene Zhulenev
d55efa6f0f TensorBlockIO 2018-07-23 15:50:55 -07:00
Eugene Zhulenev
34a75c3c5c Initial support of TensorBlock 2018-07-20 17:37:20 -07:00
Gael Guennebaud
2c2de9da7d Merged in glchaves/eigen (pull request PR-433)
Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded  block
2018-07-23 19:38:55 +00:00
Gael Guennebaud
4ca3e48f42 fix typo 2018-07-23 16:51:57 +02:00
Gael Guennebaud
c747cde69a Add lastN shorcuts to seq/seqN. 2018-07-23 16:20:25 +02:00
Gustavo Lima Chaves
02eaaacbc5 Move cxx11_tensor_uint128 test under an EIGEN_TEST_CXX11 guarded
block

Builds configured without the -DEIGEN_TEST_CXX11=ON flag would fail
right away without this, as this test seems to rely on those language
features. The skip under compilation with MSVC was kept.
2018-07-20 16:08:40 -07:00
Eugene Zhulenev
2bf864f1eb Disable type traits for stdlibc++ <= 4.9.3 2018-07-20 10:11:44 -07:00
Gael Guennebaud
de70671937 Oopps, EIGEN_COMP_MSVC is not available before including Eigen. 2018-07-20 17:51:17 +02:00
Gael Guennebaud
56a750b6cc Disable optimization for sparse_product unit test with MSVC 2013, otherwise it takes several hours to build. 2018-07-20 08:36:38 -07:00
Paul Tucker
d4afccde5a Add test coverage for ThreadPoolDevice optional allocator. 2018-07-19 17:43:44 -07:00
Eugene Zhulenev
c58b874727 PR430: Convert count to the reducer type in MeanReducer
Without explicit conversion Tensorflow fails to compile, pset1 template deduction fails.

cannot convert '((const Eigen::internal::MeanReducer<Eigen::half>*)this)
  ->Eigen::internal::MeanReducer<Eigen::half>::packetCount_'
(type 'const DenseIndex {aka const long int}')
to type 'const type& {aka const Eigen::half&}'
     return pdiv(vaccum, pset1<Packet>(packetCount_));
Honestly I’m not sure why it works in Eigen tests, because Eigen::half constructor is explicit, and why it stopped working in TF, I didn’t find any relevant changes since previous Eigen upgrade.

static_cast<T>(packetCount_) - breaks cxx11_tensor_reductions test for Eigen::half, also quite surprising.
2018-07-19 17:37:03 -07:00
Gael Guennebaud
2424e3b7ac Pass by const ref. 2018-07-19 18:48:19 +02:00
Gael Guennebaud
509a5fa77f Fix IsRelocatable without C++11 2018-07-19 18:47:38 +02:00
Gael Guennebaud
2ca2592009 Fix determination of EIGEN_HAS_TYPE_TRAITS 2018-07-19 18:47:18 +02:00
Gael Guennebaud
5e5987996f Fix stupid error in Quaternion move ctor 2018-07-19 18:33:53 +02:00
Paul Tucker
4e9848fa86 Actually add optional Allocator* arg to ThreadPoolDevice(). 2018-07-16 17:53:36 -07:00
Paul Tucker
b3e7c9132d Add optional Allocator argument to ThreadPoolDevice constructor.
When supplied, this allocator will be used in place of
internal::aligned_malloc.  This permits e.g. use of a NUMA-node specific
allocator where the thread-pool is also restricted a single NUMA-node.
2018-07-16 17:26:05 -07:00
Gael Guennebaud
40797dbea3 bug #1572: use c++11 atomic instead of volatile if c++11 is available, and disable multi-threaded GEMM on non-x86 without c++11. 2018-07-17 00:11:20 +02:00
Gael Guennebaud
add5757488 Simplify handling and non-splitted tests and include split_test_helper.h instead of re-generating it. This also allows us to modify it without breaking existing build folder. 2018-07-16 18:55:40 +02:00
Gael Guennebaud
901c7d31f0 Fix usage of EIGEN_SPLIT_LARGE_TESTS=ON: some unit tests, such as indexed_view have to be split unconditionally. 2018-07-16 18:35:05 +02:00
Gael Guennebaud
f2b52f9946 Add the cmake option "EIGEN_DASHBOARD_BUILD_TARGET" to control the build target in dashboard mode (e.g., ctest -D Experimental) 2018-07-16 17:59:30 +02:00
Gael Guennebaud
23d82c1ac5 Merged in rmlarsen/eigen2 (pull request PR-422)
Optimize the case where broadcasting is a no-op.
2018-07-14 11:42:58 +00:00
Gael Guennebaud
a87cff20df Fix GeneralizedEigenSolver when requesting for eigenvalues only. 2018-07-14 09:38:49 +02:00
Rasmus Munk Larsen
3a9cf4e290 Get rid of alias for m_broadcast. 2018-07-13 16:24:48 -07:00
Rasmus Munk Larsen
4222550e17 Optimize the case where broadcasting is a no-op. 2018-07-13 16:12:38 -07:00
Rasmus Munk Larsen
4a3952fd55 Relax the condition to not only work on Android. 2018-07-13 11:24:07 -07:00
Rasmus Munk Larsen
02a9443db9 Clang produces incorrect Thumb2 assembler when using alloca.
Don't define EIGEN_ALLOCA when generating Thumb with clang.
2018-07-13 11:03:04 -07:00
Gael Guennebaud
20991c3203 bug #1571: fix is_convertible<from,to> with "from" a reference. 2018-07-13 17:47:28 +02:00
Gael Guennebaud
1920129d71 Remove clang warning 2018-07-13 16:05:35 +02:00
Gael Guennebaud
195c9c054b Print more debug info in gpu_basic 2018-07-13 16:05:07 +02:00
Gael Guennebaud
06eb24cf4d Introduce gpu_assert for assertion in device-code, and disable them with clang-cuda. 2018-07-13 16:04:27 +02:00
Gael Guennebaud
5fd03ddbfb Make EIGEN_TEST_CUDA_CLANG more friendly with OSX 2018-07-13 16:03:14 +02:00
Gael Guennebaud
86d9c0255c Forward declaring std::array does not work with all std libs, so let's just include <array> 2018-07-13 13:06:44 +02:00
David Hyde
d908afe35f bug #1558: fix a corner case in MINRES when both v_new and w_new vanish. 2018-07-08 22:06:38 -07:00
Eugene Zhulenev
6e654f3379 Reduce number of allocations in TensorContractionThreadPool. 2018-07-16 14:26:39 -07:00
Gael Guennebaud
7ccb623746 bug #1569: fix Tensor<half>::mean() on AVX with respective unit test. 2018-07-19 13:15:40 +02:00
Alexey Frunze
1f523e7304 Add MIPS changes missing from previous merge. 2018-07-18 12:27:50 -07:00
Eugene Zhulenev
e3c2d61739 Assert that no output kernel is defined for GPU contraction 2018-07-18 14:34:22 -07:00
Eugene Zhulenev
086ded5c85 Disable type traits for GCC < 5.1.0 2018-07-18 16:32:55 -07:00
Eugene Zhulenev
79d4129cce Specify default output kernel for TensorContractionOp 2018-07-18 14:21:01 -07:00
Gael Guennebaud
6e5a3b898f Add regression for bugs #1573 and #1575 2018-07-18 23:34:34 +02:00
Gael Guennebaud
863580fe88 bug #1432: fix conservativeResize for non-relocatable scalar types. For those we need to by-pass realloc routines and fall-back to allocate as new - copy - delete. The remaining problem is that we don't have any mechanism to accurately determine whether a type is relocatable or not, so currently let's be super conservative using either RequireInitialization or std::is_trivially_copyable 2018-07-18 23:33:07 +02:00
Gael Guennebaud
053ed97c72 Generalize ScalarWithExceptions to a full non-copyable and trowing scalar type to be used in other unit tests. 2018-07-18 23:27:37 +02:00
Gael Guennebaud
a503fc8725 bug #1575: fix regression introduced in bug #1573 patch. Move ctor/assignment should not be defaulted. 2018-07-18 23:26:13 +02:00
Gael Guennebaud
308725c3c9 More clearly disable the inclusion of src/Core/arch/CUDA/Complex.h without CUDA 2018-07-18 13:51:36 +02:00
Alexey Frunze
3875fb05aa Add support for MIPS SIMD (MSA) 2018-07-06 16:04:30 -07:00
Gael Guennebaud
44ea5f7623 Add unit test for -Tensor<complex> on GPU 2018-07-12 17:19:38 +02:00
Gael Guennebaud
12e1ebb68b Remove local Index typedef from unit-tests 2018-07-12 17:16:40 +02:00
Gael Guennebaud
63185be8b2 Disable eigenvalues test for clang-cuda 2018-07-12 17:03:14 +02:00
Gael Guennebaud
bec013b2c9 fix unused warning 2018-07-12 17:02:18 +02:00
Gael Guennebaud
5c73c9223a Fix shadowing typedefs 2018-07-12 17:01:07 +02:00
Gael Guennebaud
98728312c8 Fix compilation regarding std::array 2018-07-12 17:00:37 +02:00
Gael Guennebaud
eb3d8f68bb fix unused warning 2018-07-12 16:59:47 +02:00
Gael Guennebaud
006e18e52b Cleanup the mess in Eigen/Core by moving CUDA/HIP stuff at more appropriate places (Macros.h),
and alignment/vectorization logic is now in util/ConfigureVectorization.h
2018-07-12 16:57:41 +02:00
Thales Sabino
9a6a43319f Fix cxx11_tensor_fft not building on Windows.
The type used in Eigen::DSizes needs to be at least 8 bytes long. Internally Tensor tries to convert this to an __int64 on Windows and this fails to build. On Linux, long and long long are both 8 byte integer types.
* * *
Changing from "long long" to "std::int64_t".
2018-07-12 11:20:59 +01:00
Gael Guennebaud
b347eb0b1c Fix doc 2018-07-12 11:56:18 +02:00
Mark D Ryan
e79c5149bf Fix AVX512 implementations of psqrt
This commit fixes the AVX512 implementations of psqrt in the same
way that 3ed67cb0bb
 fixed the AVX2 version of this function.  The
AVX512 versions of psqrt incorrectly return -0.0 for negative
values, instead of NaN.  Fixing the issues requires adding
some additional instructions that slow down the algorithms.  A
similar test to the one used in 3ed67cb0bb
 shows that the
corrected Packet16f code runs at 73% of the speed of the existing code,
while the corrected Packed8d function runs at 68% of the original.
2018-06-25 05:05:02 -07:00
Yuefeng Zhou
1eff6cf8a7 Use device's allocate function instead of internal::aligned_malloc. This would make it easier to track memory usage in device instances. 2018-02-20 16:50:05 -08:00
Gael Guennebaud
adb134d47e Fix implicit conversion from 0.0 to scalar 2018-02-16 22:26:01 +04:00
Gael Guennebaud
937ad18221 add unit test for SimplicialCholesky and Boost multiprec. 2018-02-16 22:25:11 +04:00
Julian Kent
6d451cf2b6 Add missing consts for rows and cols functions in SparseLU 2018-02-10 13:44:05 +01:00
Daniele E. Domenichelli
a12b8a8c75 FindEigen3: Set Eigen3_FOUND variable 2018-07-11 16:31:50 +02:00
Gael Guennebaud
8bdb214fd0 remove double ;; 2018-07-12 11:17:53 +02:00
Gael Guennebaud
a9060378d3 bug #1570: fix warning 2018-07-12 11:07:09 +02:00
Gael Guennebaud
6cd6551b26 Add deprecated header files for TensorFlow 2018-07-12 10:50:53 +02:00
Gael Guennebaud
da0c604078 Merged in deven-amd/eigen (pull request PR-402)
Adding support for using Eigen in HIP kernels.
2018-07-12 08:07:16 +00:00
Gael Guennebaud
a4ea611ca7 Remove useless specialization thanks to is_convertible being more robust. 2018-07-12 09:59:44 +02:00
Gael Guennebaud
8a40dda5a6 Add some basic unit-tests 2018-07-12 09:59:00 +02:00
Gael Guennebaud
8ef267ccbd spellcheck 2018-07-12 09:58:29 +02:00
Gael Guennebaud
21cf4a1a8b Make is_convertible more robust and conformant to std::is_convertible 2018-07-12 09:57:19 +02:00
Gael Guennebaud
8a5955a052 Optimize the product of a householder-sequence with the identity, and optimize the evaluation of a HouseholderSequence to a dense matrix using faster blocked product. 2018-07-11 17:16:50 +02:00
Gael Guennebaud
d193cc87f4 Fix regression in 9357838f94 2018-07-11 17:09:23 +02:00
Gael Guennebaud
fb33687736 Fix double ;; 2018-07-11 17:08:30 +02:00
Deven Desai
876f392c39 Updates corresponding to the latest round of PR feedback
The major changes are

1. Moving CUDA/PacketMath.h to GPU/PacketMath.h
2. Moving CUDA/MathFunctions.h to GPU/MathFunction.h
3. Moving CUDA/CudaSpecialFunctions.h to GPU/GpuSpecialFunctions.h
    The above three changes effectively enable the Eigen "Packet" layer for the HIP platform

4. Merging the "hip_basic" and "cuda_basic" unit tests into one ("gpu_basic")
5. Updating the "EIGEN_DEVICE_FUNC" marking in some places

The change has been tested on the HIP and CUDA platforms.
2018-07-11 10:39:54 -04:00
Deven Desai
1fe0b74904 deleting hip specific files that are no longer required 2018-07-11 09:28:44 -04:00
Deven Desai
dec47a6493 renaming CUDA* to GPU* for some header files 2018-07-11 09:26:54 -04:00
Deven Desai
471cfe5ff7 renaming CUDA* to GPU* for some header files 2018-07-11 09:22:04 -04:00
Deven Desai
38807a2575 merging updates from upstream 2018-07-11 09:17:33 -04:00
Gael Guennebaud
f00d08cc0a Optimize extraction of Q in SparseQR by exploiting the structure of the identity matrix. 2018-07-11 14:01:47 +02:00
Gael Guennebaud
1625476091 Add internall::is_identity compile-time helper 2018-07-11 14:00:24 +02:00
Gael Guennebaud
fe723d6129 Fix conversion warning 2018-07-10 09:10:32 +02:00
Gael Guennebaud
9357838f94 bug #1543: improve linear indexing for general block expressions 2018-07-10 09:10:15 +02:00
Gael Guennebaud
de9e31a06d Introduce the macro ei_declare_local_nested_eval to help allocating on the stack local temporaries via alloca, and let outer-products makes a good use of it.
If successful, we should use it everywhere nested_eval is used to declare local dense temporaries.
2018-07-09 15:41:14 +02:00
Gael Guennebaud
6190aa5632 bug #1567: add optimized path for tensor broadcasting and 'Channel First' shape 2018-07-09 11:23:16 +02:00
Gael Guennebaud
ec323b7e66 Skip null numerators in triangular-vector-solve (as in BLAS TRSV). 2018-07-09 11:13:19 +02:00
Gael Guennebaud
359dd77ec3 Fix legitimate "declaration shadows a typedef" warning 2018-07-09 11:03:39 +02:00
Deven Desai
e2b2c61533 merging from master 2018-06-20 16:47:45 -04:00
Deven Desai
1bb6fa99a3 merging the CUDA and HIP implementation for the Tensor directory and the unit tests 2018-06-20 16:44:58 -04:00
Deven Desai
cfdabbcc8f removing the *Hip files from the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:57:02 -04:00
Deven Desai
7e41c8f1a9 renaming *Cuda files to *Gpu in the unsupported/Eigen/CXX11/src/Tensor and unsupported/test directories 2018-06-20 12:52:30 -04:00
Deven Desai
ee73ae0a80 Merged eigen/eigen into default 2018-06-20 12:37:11 -04:00
Mark D Ryan
90a53ca6fd Fix the Packet16h version of ptranspose
The AVX512 version of ptranpose for PacketBlock<Packet16h,16> was
reordering the PacketBlock argument incorrectly.  This lead to errors in
the multiplication of matrices composed of 16 bit floats on AVX512
machines, if at least of the matrices was using RowMajor order.  This
error is responsible for one tensorflow unit test failure on AVX512
machines:

//tensorflow/python/kernel_tests:batch_matmul_op_test
2018-06-16 15:13:06 -07:00
Gael Guennebaud
1f54164eca Fix a few issues with Packet16h 2018-07-07 00:15:07 +02:00
Gael Guennebaud
f2dc048df9 complete implementation of Packet16h (AVX512) 2018-07-06 17:43:11 +02:00
Gael Guennebaud
a937c50208 palign is not used anymore, so let's relax the unit test 2018-07-06 17:41:52 +02:00
Gael Guennebaud
56a33ae57d test product kernel with half-floats. 2018-07-06 17:14:04 +02:00
Gael Guennebaud
f4d623ffa7 Complete Packet8h implementation and test it in packetmath unit test 2018-07-06 17:13:36 +02:00
Gael Guennebaud
a8ab6060df Add unitests for inverse and selfadjoint-eigenvalues on CUDA 2018-07-06 09:58:45 +02:00
Gael Guennebaud
b8271bb368 fix md5sum of lapack_addons 2018-06-15 14:21:29 +02:00
Deven Desai
b6cc0961b1 updates based on PR feedback
There are two major changes (and a few minor ones which are not listed here...see PR discussion for details)

1. Eigen::half implementations for HIP and CUDA have been merged.
This means that
- `CUDA/Half.h` and `HIP/hcc/Half.h` got merged to a new file `GPU/Half.h`
- `CUDA/PacketMathHalf.h` and `HIP/hcc/PacketMathHalf.h` got merged to a new file `GPU/PacketMathHalf.h`
- `CUDA/TypeCasting.h` and `HIP/hcc/TypeCasting.h` got merged to a new file `GPU/TypeCasting.h`

After this change the `HIP/hcc` directory only contains one file `math_constants.h`. That will go away too once that file becomes a part of the HIP install.

2. new macros EIGEN_GPUCC, EIGEN_GPU_COMPILE_PHASE and EIGEN_HAS_GPU_FP16 have been added and the code has been updated to use them where appropriate.
- `EIGEN_GPUCC` is the same as `(EIGEN_CUDACC || EIGEN_HIPCC)`
- `EIGEN_GPU_DEVICE_COMPILE` is the same as `(EIGEN_CUDA_ARCH || EIGEN_HIP_DEVICE_COMPILE)`
- `EIGEN_HAS_GPU_FP16` is the same as `(EIGEN_HAS_CUDA_FP16 or EIGEN_HAS_HIP_FP16)`
2018-06-14 10:21:54 -04:00
Deven Desai
ba972fb6b4 moving Half headers from CUDA dir to GPU dir, removing the HIP versions 2018-06-13 12:26:18 -04:00
Deven Desai
d1d22ef0f4 syncing this fork with upstream 2018-06-13 12:09:52 -04:00
Benoit Steiner
d3a380af4d Merged in mfigurnov/eigen/gamma-der-a (pull request PR-403)
Derivative of the incomplete Gamma function and the sample of a Gamma random variable

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-11 17:57:47 +00:00
Andrea Bocci
f7124b3e46 Extend CUDA support to matrix inversion and selfadjointeigensolver 2018-06-11 18:33:24 +02:00
Gael Guennebaud
0537123953 bug #1565: help MSVC to generatenot too bad ASM in reductions. 2018-07-05 09:21:26 +02:00
Gael Guennebaud
6a241bd8ee Implement custom inplace triangular product to avoid a temporary 2018-07-03 14:02:46 +02:00
Gael Guennebaud
3ae2083e23 Make is_same_dense compatible with different scalar types. 2018-07-03 13:21:43 +02:00
Gael Guennebaud
67ec37f7b0 Activate dgmres unit test 2018-07-02 12:54:14 +02:00
Gael Guennebaud
047677a08d Fix regression in changeset f05dea6b23
: computeFromHessenberg can take any expression for matrixQ, not only an HouseholderSequence.
2018-07-02 12:18:25 +02:00
Gael Guennebaud
d625564936 Simplify redux_evaluator using inheritance, and properly rename parameters in reducers. 2018-07-02 11:50:41 +02:00
Gael Guennebaud
d428a199ab bug #1562: optimize evaluation of small products of the form s*A*B by rewriting them as: s*(A.lazyProduct(B)) to save a costly temporary. Measured speedup from 2x to 5x... 2018-07-02 11:41:09 +02:00
Gael Guennebaud
a7b313a16c Fix unit test 2018-07-01 22:45:47 +02:00
Gael Guennebaud
0cdacf3fa4 update comment 2018-06-29 11:28:36 +02:00
Gael Guennebaud
54f6eeda90 Merged in net147/eigen (pull request PR-411)
Use std::complex constructor instead of assignment from scalar
2018-06-28 21:01:04 +00:00
Gael Guennebaud
9a81de1d35 Fix order of EIGEN_DEVICE_FUNC and returned type 2018-06-28 00:20:59 +02:00
Jonathan Liu
b7689bded9 Use std::complex constructor instead of assignment from scalar
Fixes GCC conversion to non-scalar type requested compile error when
using boost::multiprecision::cpp_dec_float_50 as scalar type.
2018-06-28 00:32:37 +10:00
Gael Guennebaud
f9d337780d First step towards a generic vectorised quaternion product 2018-06-25 14:26:51 +02:00
Gael Guennebaud
ee5864f72e bug #1560 fix product with a 1x1 diagonal matrix 2018-06-25 10:30:12 +02:00
Rasmus Munk Larsen
2f62cc68cd merge 2018-06-22 15:09:44 -07:00
Rasmus Munk Larsen
bda71ad394 Fix typo in pbend for AltiVec. 2018-06-22 15:04:35 -07:00
Benoit Steiner
b6ffcd22e3 Merged in rmlarsen/eigen2 (pull request PR-409)
Fix oversharding bug in parallelFor.
2018-06-21 18:34:57 +00:00
Gael Guennebaud
4cc32d80fd bug #1555: compilation fix with XLC 2018-06-21 10:28:38 +02:00
Rasmus Munk Larsen
5418154a45 Fix oversharding bug in parallelFor. 2018-06-20 17:51:48 -07:00
Gael Guennebaud
cb4c9a6a94 bug #1531: make dedicatd unit testing for NumDimensions 2018-06-08 17:11:45 +02:00
Gael Guennebaud
d6813fb1c5 bug #1531: expose NumDimensions for solve and sparse expressions. 2018-06-08 16:55:10 +02:00
Gael Guennebaud
89d65bb9d6 bug #1531: expose NumDimensions for compatibility with Tensor 2018-06-08 16:50:17 +02:00
Gael Guennebaud
f05dea6b23 bug #1550: prevent avoidable memory allocation in RealSchur 2018-06-08 10:14:57 +02:00
Gael Guennebaud
7933267c67 fix prototype 2018-06-08 09:56:01 +02:00
Gael Guennebaud
f4d1461874 Fix the way matrix folder is passed to the tests. 2018-06-08 09:55:46 +02:00
Benoit Steiner
522d3ca54d Don't use std::equal_to inside cuda kernels since it's not supported. 2018-06-07 13:02:07 -07:00
Christoph Hertzberg
7d7bb91537 Missing line during manual rebase of PR-374 2018-06-07 20:30:09 +02:00
Michael Figurnov
30fa3d0454 Merge from eigen/eigen 2018-06-07 17:57:56 +01:00
Benoit Steiner
d2b0a4a59b Merged in mfigurnov/eigen/fix-bessel (pull request PR-404)
Fix compilation of special functions without C99 math.
2018-06-07 16:12:42 +00:00
Michael Figurnov
6c71c7d360 Merge from eigen/eigen. 2018-06-07 15:54:18 +01:00
Gael Guennebaud
c25034710e Fiw some warnings in dox examples 2018-06-07 16:09:22 +02:00
Gael Guennebaud
37348d03ae Fix int versus Index 2018-06-07 15:56:43 +02:00
Gael Guennebaud
c723ffd763 Fix warning 2018-06-07 15:56:20 +02:00
Gael Guennebaud
af7c83b9a2 Fix warning 2018-06-07 15:45:24 +02:00
Gael Guennebaud
7fe29aceeb Fix MSVC warning C4290: C++ exception specification ignored except to indicate a function is not __declspec(nothrow) 2018-06-07 15:36:20 +02:00
Michael Figurnov
aa813d417b Fix compilation of special functions without C99 math.
The commit with Bessel functions i0e and i1e placed the ifdef/endif incorrectly,
causing i0e/i1e to be undefined when EIGEN_HAS_C99_MATH=0. These functions do not
actually require C99 math, so now they are always available.
2018-06-07 14:35:07 +01:00
Gael Guennebaud
55774b48e4 Fix short vs long 2018-06-07 15:26:25 +02:00
Christoph Hertzberg
e5f9f4768f Avoid unnecessary C++11 dependency 2018-06-07 15:03:50 +02:00
Gael Guennebaud
b3fd93207b Fix typos found using codespell 2018-06-07 14:43:02 +02:00
Michael Figurnov
5172a32849 Updated the stopping criteria in igammac_cf_impl.
Previously, when computing the derivative, it used a relative error threshold. Now it uses an absolute error threshold. The behavior for computing the value is unchanged. This makes more sense since we do not expect the derivative to often be close to zero. This change makes the derivatives about 30% faster across the board. The error for the igamma_der_a is almost unchanged, while for gamma_sample_der_alpha it is a bit worse for float32 and unchanged for float64.
2018-06-07 12:03:58 +01:00
Michael Figurnov
4bd158fa37 Derivative of the incomplete Gamma function and the sample of a Gamma random variable.
In addition to igamma(a, x), this code implements:
* igamma_der_a(a, x) = d igamma(a, x) / da -- derivative of igamma with respect to the parameter
* gamma_sample_der_alpha(alpha, sample) -- reparameterization derivative of a Gamma(alpha, 1) random variable sample with respect to the alpha parameter

The derivatives are computed by forward mode differentiation of the igamma(a, x) code. Although gamma_sample_der_alpha can be implemented via igamma_der_a, a separate function is more accurate and efficient due to analytical cancellation of some terms. All three functions are implemented by a method parameterized with "mode" that always computes the derivatives, but does not return them unless required by the mode. The compiler is expected to (and, based on benchmarks, does) skip the unnecessary computations depending on the mode.
2018-06-06 18:49:26 +01:00
Deven Desai
8fbd47052b Adding support for using Eigen in HIP kernels.
This commit enables the use of Eigen on HIP kernels / AMD GPUs. Support has been added along the same lines as what already exists for using Eigen in CUDA kernels / NVidia GPUs.

Application code needs to explicitly define EIGEN_USE_HIP when using Eigen in HIP kernels. This is because some of the CUDA headers get picked up by default during Eigen compile (irrespective of whether or not the underlying compiler is CUDACC/NVCC, for e.g. Eigen/src/Core/arch/CUDA/Half.h). In order to maintain this behavior, the EIGEN_USE_HIP macro is used to switch to using the HIP version of those header files (see Eigen/Core and unsupported/Eigen/CXX11/Tensor)


Use the "-DEIGEN_TEST_HIP" cmake option to enable the HIP specific unit tests.
2018-06-06 10:12:58 -04:00
Benoit Steiner
e206f8d4a4 Merged in mfigurnov/eigen (pull request PR-400)
Exponentially scaled modified Bessel functions of order zero and one.

Approved-by: Benoit Steiner <benoit.steiner.goog@gmail.com>
2018-06-05 17:05:21 +00:00
Penporn Koanantakool
e2ed0cf8ab Add a ThreadPoolInterface* getter for ThreadPoolDevice. 2018-06-02 12:07:49 -07:00
Gael Guennebaud
84868da904 Don't run hg on non mercurial clone 2018-05-31 21:21:57 +02:00
Michael Figurnov
f216854453 Exponentially scaled modified Bessel functions of order zero and one.
The functions are conventionally called i0e and i1e. The exponentially scaled version is more numerically stable. The standard Bessel functions can be obtained as i0(x) = exp(|x|) i0e(x)

The code is ported from Cephes and tested against SciPy.
2018-05-31 15:34:53 +01:00
Gael Guennebaud
6af1433cb5 Doc: add aliasing in common pitfaffs. 2018-05-29 22:37:47 +02:00
Katrin Leinweber
ea94543190 Hyperlink DOIs against preferred resolver 2018-05-24 18:55:40 +02:00
Gael Guennebaud
999b552c16 Search for sequential Pastix. 2018-05-29 20:49:25 +02:00
Gael Guennebaud
eef4b7bd87 Fix handling of path names containing spaces and the likes. 2018-05-29 20:49:06 +02:00
Gael Guennebaud
647b724a36 Define pcast<> for SSE types even when AVX is enabled. (otherwise float are silently reinterpreted as int instead of being converted) 2018-05-29 20:46:46 +02:00
Gael Guennebaud
49262dfee6 Fix compilation and SSE support with PGI compiler 2018-05-29 15:09:31 +02:00
Christoph Hertzberg
750af06362 Add an option to test with external BLAS library 2018-05-22 21:04:32 +02:00
Christoph Hertzberg
d06a753d10 Make qr_fullpivoting unit test run for fixed-sized matrices 2018-05-22 20:29:17 +02:00
Gael Guennebaud
f0862b062f Fix internal::is_integral<size_t/ptrdiff_t> with MSVC 2013 and older. 2018-05-22 19:29:51 +02:00
Gael Guennebaud
36e413a534 Workaround a MSVC 2013 compilation issue with MatrixBase(Index,int) 2018-05-22 18:51:35 +02:00
Gael Guennebaud
725bd92903 fix stupid typo 2018-05-18 17:46:43 +02:00
Gael Guennebaud
a382bc9364 is_convertible<T,Index> does not seems to work well with MSVC 2013, so let's rather use __is_enum(T) for old MSVC versions 2018-05-18 17:02:27 +02:00
Gael Guennebaud
4dd767f455 add some internal checks 2018-05-18 13:59:55 +02:00
Gael Guennebaud
345c0ab450 check that all integer types are properly handled by mat(i,j) 2018-05-18 13:46:46 +02:00
Mark D Ryan
405859f18d Set EIGEN_IDEAL_MAX_ALIGN_BYTES correctly for AVX512 builds
bug #1548

The macro EIGEN_IDEAL_MAX_ALIGN_BYTES is being incorrectly set to 32
on AVX512 builds.  It should be set to 64.  In the current code it is
only set to 64 if the macro EIGEN_VECTORIZE_AVX512 is defined.  This
macro does get defined in AVX512 builds in Core, but only after Macros.h,
the file that defines EIGEN_IDEAL_MAX_ALIGN_BYTES, has been included.
This commit fixes the issue by setting EIGEN_IDEAL_MAX_ALIGN_BYTES to
64 if __AVX512F__ is defined.
2018-05-17 17:04:00 +01:00
Vamsi Sripathi
6293ad3f39 Performance improvements to tensor broadcast operation
1. Added new packet functions using SIMD for NByOne, OneByN cases
  2. Modified existing packet functions to reduce index calculations when input stride is non-SIMD
  3. Added 4 test cases to cover the new packet functions
2018-05-23 14:02:05 -07:00
Gael Guennebaud
7134fa7a2e Fix compilation with MSVC by reverting to char* for _mm_prefetch except for PGI (the later being the one that has the wrong prototype). 2018-06-07 09:33:10 +02:00
Jeff Trull
e7147f69ae Add tests for sparseQR results (value and size) covering bugs #1522 and #1544 2018-04-21 10:26:30 -07:00
Robert Lukierski
b2053990d0 Adding EIGEN_DEVICE_FUNC to Products, especially Dense2Dense Assignment
specializations. Otherwise causes problems with small fixed size matrix multiplication (call to
0x00 in call_assignment_no_alias in debug mode or trap in release with CUDA 9.1).
2018-03-14 16:19:43 +00:00
Jeff Trull
9f0c5c3669 Make sparse QR result sizes consistent with dense QR, with the following rules:
1) Q is always square
2) Q*R*P' is valid and recovers the original matrix

This implies that the size of Q is the number of rows in the original matrix, square,
and that the size of R is the size of the original matrix.
2018-02-15 15:00:31 -08:00
Christoph Hertzberg
d655900953 bug #1544: Generate correct Q matrix in complex case. Original patch was by Jeff Trull in PR-386. 2018-05-17 19:17:01 +02:00
Benoit Steiner
0371380d5b Merged in rmlarsen/eigen2 (pull request PR-393)
Rename scalar_clip_op to scalar_clamp_op to prevent collision with existing functor in TensorFlow.
2018-05-16 21:45:42 +00:00
Rasmus Munk Larsen
b8d36774fa Rename clip2 to clamp. 2018-05-16 14:04:48 -07:00
Rasmus Munk Larsen
812480baa3 Rename scalar_clip_op to scalar_clip2_op to prevent collision with existing functor in TensorFlow. 2018-05-16 09:49:24 -07:00
Benoit Steiner
1403c2c15b Merged in didierjansen/eigen (pull request PR-360)
Fix bugs and typos in the contraction example of the tensor README
2018-05-16 01:16:36 +00:00
Benoit Steiner
ad355b3f05 Merged in rmlarsen/eigen2 (pull request PR-392)
Add vectorized clip functor for Eigen Tensors
2018-05-16 01:15:56 +00:00
Christoph Hertzberg
0272f2451a Fix "suggest parentheses around comparison" warning 2018-05-15 19:35:53 +02:00
Rasmus Munk Larsen
afec3021f7 Use numext::maxi & numext::mini. 2018-05-14 16:35:39 -07:00
Rasmus Munk Larsen
b8c8e5f436 Add vectorized clip functor for Eigen Tensors. 2018-05-14 16:07:13 -07:00
Benoit Steiner
6118c6ff4f Enable RawAccess to tensor slices whenever possinle.
Avoid 32-bit integer overflow in TensorSlicingOp
2018-04-30 11:28:12 -07:00
Gael Guennebaud
6e7118265d Fix compilation with NEON+MSVC 2018-04-26 10:50:41 +02:00
Gael Guennebaud
097dd4616d Fix unit test for SIMD engine not supporting sqrt 2018-04-26 10:47:39 +02:00
Gael Guennebaud
8810baaed4 Add multi-threading for sparse-row-major * dense-row-major 2018-04-25 10:14:48 +02:00
Gael Guennebaud
2f3287da7d Fix "used uninitialized" warnings 2018-04-24 17:17:25 +02:00
Gael Guennebaud
3ffd449ef5 Workaround warning 2018-04-24 17:11:51 +02:00
Gael Guennebaud
e8ca5166a9 bug #1428: atempt to make NEON vectorization compilable by MSVC.
The workaround is to wrap NEON packet types to make them different c++ types.
2018-04-24 11:19:49 +02:00
Benoit Steiner
6f5935421a fix AVX512 plog 2018-04-23 15:49:26 +00:00
Gael Guennebaud
e9da464e20 Add specializations of is_arithmetic for long long in c++11 2018-04-23 16:26:29 +02:00
Gael Guennebaud
a57e6e5f0f workaround MSVC 2013 compilation issue (ambiguous call) 2018-04-23 15:31:51 +02:00
Gael Guennebaud
11123175db typo in doc 2018-04-23 15:30:35 +02:00
Gael Guennebaud
5679e439e0 bug #1543: fix linear indexing in generic block evaluation (this completes the fix in commit 12efc7d41b
)
2018-04-23 14:40:16 +02:00
Gael Guennebaud
35b31353ab Fix unit test 2018-04-22 22:49:08 +02:00
Christoph Hertzberg
34e499ad36 Disable -Wshadow when compiling with g++ 2018-04-21 22:08:26 +02:00
Jayaram Bobba
b7b868d1c4 fix AVX512 plog 2018-04-20 13:39:18 -07:00
Gael Guennebaud
686fb57233 fix const cast in NEON 2018-04-18 18:46:34 +02:00
Dmitriy Korchemkin
02d2f1cb4a Cast zeros to Scalar in RealSchur 2018-04-18 13:52:46 +03:00
Christoph Hertzberg
50633d1a83 Renamed .trans() et al. to .reverseFlag() et at. Adapted documentation of .setReverseFlag() 2018-04-17 11:30:27 +02:00
nicolov
39c2cba810 Add a specialization of Eigen::numext::conj for std::complex<T> to be used when compiling a cuda kernel. This fixes the compilation of TensorFlow 1.4 with clang 6.0 used as CUDA compiler with libc++.
This follows the previous change in 2a69290ddb
, which mentions OSX (I guess because it uses libc++ too).
2018-04-13 22:29:10 +00:00
Christoph Hertzberg
775766d175 Add parenthesis to fix compiler warnings 2018-04-15 18:43:56 +02:00
Christoph Hertzberg
42715533f1 bug #1493: Make representation of HouseholderSequence consistent and working for complex numbers. Made corresponding unit test actually test that. Also simplify implementation of QR decompositions 2018-04-15 10:15:28 +02:00
Christoph Hertzberg
c9ecfff2e6 Add links where to make PRs and report bugs into README.md 2018-04-13 21:05:28 +00:00
Christoph Hertzberg
c8b19702bc Limit test size for sparse Cholesky solvers to EIGEN_TEST_MAX_SIZE 2018-04-13 20:36:58 +02:00
Christoph Hertzberg
2cbb00b18e No need to make noise, if KLU is found 2018-04-13 19:14:25 +02:00
Christoph Hertzberg
84dcd998a9 Recent Adolc versions require C++11 2018-04-13 19:10:23 +02:00
Christoph Hertzberg
4d392d93aa Make hypot_impl compile again for types with expression-templates (e.g., boost::multiprecision) 2018-04-13 19:01:37 +02:00
Christoph Hertzberg
072e111ec0 SelfAdjointView<...,Mode> causes a static assert since commit d820ab9edc 2018-04-13 19:00:34 +02:00
Gael Guennebaud
7a9089c33c fix linking issue 2018-04-13 08:51:47 +02:00
Gael Guennebaud
e43ca0320d bug #1520: workaround some -Wfloat-equal warnings by calling std::equal_to 2018-04-11 15:24:13 +02:00
Weiming Zhao
b0eda3cb9f Avoid using memcpy for non-POD elements 2018-04-11 11:37:06 +02:00
Gael Guennebaud
79266fec75 extend doxygen splitter for huge screens 2018-04-11 11:31:17 +02:00
Gael Guennebaud
426052ef6e Update header/footer for doxygen 1.8.13 2018-04-11 11:30:34 +02:00
Gael Guennebaud
9c8decffbf Fix javascript hacks for oxygen 1.8.13 2018-04-11 11:30:14 +02:00
Gael Guennebaud
e798466871 bug #1538: update manual pages regarding BDCSVD. 2018-04-11 10:46:11 +02:00
Gael Guennebaud
c91906b065 Umfpack: UF_long has been removed in recent versions of suitesparse, and fix a few long-to-int conversions issues. 2018-04-11 09:59:59 +02:00
Gael Guennebaud
0050709ea7 Merged in v_huber/eigen (pull request PR-378)
Add interface to umfpack_*l_* functions
2018-04-11 07:43:04 +00:00
Guillaume Jacob
8c1652055a Fix code sample output in block(int, int, int, int) doxygen 2018-04-09 17:23:59 +02:00
vhuber
08008f67e1 Add unitTest 2018-04-09 17:07:46 +02:00
Gael Guennebaud
add15924ac Fix MKL backend for symmetric eigenvalues on row-major matrices. 2018-04-09 13:29:26 +02:00
Gael Guennebaud
04b1628e55 Add missing empty line. 2018-04-09 13:28:31 +02:00
Gael Guennebaud
c2624c0318 Fix cmake scripts with no fortran compiler 2018-04-07 08:45:19 +02:00
Gael Guennebaud
2f833b1c64 bug #1509: fix computeInverseWithCheck for complexes 2018-04-04 15:47:46 +02:00
Gael Guennebaud
b903fa74fd Extend list of MSVC versions 2018-04-04 15:14:09 +02:00
Gael Guennebaud
403f09ccef Make stableNorm and blueNorm compatible with 2D matrices. 2018-04-04 15:13:31 +02:00
Gael Guennebaud
4213b63f5c Factories code between numext::hypot and scalar_hyot_op functor. 2018-04-04 15:12:43 +02:00
Gael Guennebaud
368dd4cd9d Make innerVector() and innerVectors() methods available to all expressions supported by Block.
Before, only SparseBase exposed such methods.
2018-04-04 15:09:21 +02:00
Gael Guennebaud
e116f6847e bug #1521: avoid signalling NaN in hypot and make it std::complex<> friendly. 2018-04-04 13:47:23 +02:00
Gael Guennebaud
73729025a4 bug #1521: add unit test dedicated to numbest::hypos 2018-04-04 13:45:34 +02:00
Gael Guennebaud
13f5df9f67 Add a note on vec_min vs asm 2018-04-04 13:10:38 +02:00
Gael Guennebaud
e91e314347 bug #1494: makes pmin/pmax behave on Altivec/VSX as on x86 regading NaNs 2018-04-04 11:39:19 +02:00
Gael Guennebaud
112c899304 comment unreachable code 2018-04-03 23:16:43 +02:00
Gael Guennebaud
a1292395d6 Fix compilation of product with inverse transpositions (e.g., mat * Transpositions().inverse()) 2018-04-03 23:06:44 +02:00
Gael Guennebaud
8c7b5158a1 commit 45e9c9996da790b55ed9c4b0dfeae49492ac5c46 (HEAD -> memory_fix)
Author: George Burgess IV <gbiv@google.com>
Date:   Thu Mar 1 11:20:24 2018 -0800

    Prefer `::operator new` to `new`

    The C++ standard allows compilers much flexibility with `new`
    expressions, including eliding them entirely
    (https://godbolt.org/g/yS6i91). However, calls to `operator new` are
    required to be treated like opaque function calls.

    Since we're calling `new` for side-effects other than allocating heap
    memory, we should prefer the less flexible version.

    Signed-off-by: George Burgess IV <gbiv@google.com>
2018-04-03 17:15:38 +02:00
Gael Guennebaud
dd4cc6bd9e bug #1527: fix support for MKL's VML (destination was not properly resized) 2018-04-03 17:11:15 +02:00
Gael Guennebaud
c5b56f1fb2 bug #1528: better use numeric_limits::min() instead of 1/highest() that with underflow. 2018-04-03 16:49:35 +02:00
Gael Guennebaud
8d0ffe3655 bug #1516: add assertion for out-of-range diagonal index in MatrixBase::diagonal(i) 2018-04-03 16:15:43 +02:00
Gael Guennebaud
407e3e2621 bug #1532: disable stl::*_negate in C++17 (they are deprecated) 2018-04-03 15:59:30 +02:00
Gael Guennebaud
40b4bf3d32 AVX512: _mm512_rsqrt28_ps is available for AVX512ER only 2018-04-03 14:36:27 +02:00
Gael Guennebaud
584951ca4d Rename predux_downto4 to be more accurate on its semantic. 2018-04-03 14:28:38 +02:00
Gael Guennebaud
67bac6368c protect calls to isnan 2018-04-03 14:19:04 +02:00
Gael Guennebaud
d43b2f01f4 Fix unit testing of predux_downto4 (bad name), and add unit testing of prsqrt 2018-04-03 14:14:00 +02:00
Gael Guennebaud
7b0630315f AVX512: fix psqrt and prsqrt 2018-04-03 14:12:50 +02:00
Gael Guennebaud
6719409cd9 AVX512: add missing pinsertfirst and pinsertlast, implement pblend for Packet8d, fix compilation without AVX512DQ 2018-04-03 14:11:56 +02:00
Gael Guennebaud
524119d32a Fix uninitialized output argument. 2018-04-03 10:56:10 +02:00
vhuber
267a144da5 Remove unnecessary define 2018-03-30 23:04:53 +02:00
vhuber
baf9a5a776 Add interface to umfpack_*l_* functions 2018-03-30 18:53:34 +02:00
luz.paz
e3912f5e63 MIsc. source and comment typos
Found using `codespell` and `grep` from downstream FreeCAD
2018-03-11 10:01:44 -04:00
Gael Guennebaud
5deeb19e7b bug #1517: fix triangular product with unit diagonal and nested scaling factor: (s*A).triangularView<UpperUnit>()*B 2018-02-09 16:52:35 +01:00
Gael Guennebaud
12efc7d41b Fix linear indexing in generic block evaluation. 2018-02-09 16:45:49 +01:00
Gael Guennebaud
f4a6863c75 Fix typo 2018-02-09 16:43:49 +01:00
Viktor Csomor
000840cae0 Added a move constructor and move assignment operator to Tensor and wrote some tests. 2018-02-07 19:10:54 +01:00
Gael Guennebaud
3a2dc3869e Fix weird issue with MSVC 2013 2018-07-18 02:26:43 -07:00
Eugene Zhulenev
c95aacab90 Fix TensorContractionOp evaluators for GPU and SYCL 2018-07-17 14:09:37 -07:00
Gael Guennebaud
038b55464b Merged in deven-amd/eigen (pull request PR-425)
applying EIGEN_DECLARE_TEST to *gpu  unit tests
2018-07-17 21:14:40 +00:00
Deven Desai
f124f07965 applying EIGEN_DECLARE_TEST to *gpu* tests
Also, a few minor fixes for GPU tests running in HIP mode.

1. Adding an include for hip/hip_runtime.h in the Macros.h file
   For HIP __host__ and __device__ are macros which are defined in hip headers.
   Their definitions need to be included before their use in the file.

2. Fixing the compile failure in TensorContractionGpu introduced by the commit to
   "Fuse computations into the Tensor contractions using output kernel"

3. Fixing a HIP/clang specific compile error by making the struct-member assignment explicit
2018-07-17 14:16:48 -04:00
Gael Guennebaud
dff3a92d52 Remove usage of #if EIGEN_TEST_PART_XX in unit tests that does not require them (splitting can thus be avoided for them) 2018-07-17 15:52:58 +02:00
Gael Guennebaud
82f0ce2726 Get rid of EIGEN_TEST_FUNC, unit tests must now be declared with EIGEN_DECLARE_TEST(mytest) { /* code */ }.
This provide several advantages:
- more flexibility in designing unit tests
- unit tests can be glued to speed up compilation
- unit tests are compiled with same predefined macros, which is a requirement for zapcc
2018-07-17 14:46:15 +02:00
Gael Guennebaud
37f4bdd97d Fix VERIFY_EVALUATION_COUNT(EXPR,N) with a complex expression as N 2018-07-17 13:20:49 +02:00
Gael Guennebaud
2b2cd85694 bug #1573: add noexcept move constructor and move assignment operator to Quaternion 2018-07-17 11:11:33 +02:00
Eugene Zhulenev
43206ac4de Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
e204ecdaaf Remove SimpleThreadPool and always use {NonBlocking}ThreadPool 2018-07-16 15:06:57 -07:00
Eugene Zhulenev
b324ed55d9 Call OutputKernel in evalGemv 2018-07-12 14:52:23 -07:00
Eugene Zhulenev
01fd4096d3 Fuse computations into the Tensor contractions using output kernel 2018-07-10 13:16:38 -07:00
Gael Guennebaud
5539587b1f Some warning fixes 2018-07-17 10:29:12 +02:00
Benoit Steiner
8f55956a57 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2018-01-30 20:22:12 +00:00
Gael Guennebaud
09a16ba42f bug #1412: fix compilation with nvcc+MSVC 2018-01-17 23:13:16 +01:00
Lee.Deokjae
5b3c367926 Fix typos in the contraction example of tensor README 2018-01-06 14:36:19 +09:00
Eugene Chereshnev
f558ad2955 Fix incorrect ldvt in LAPACKE call from JacobiSVD 2018-01-03 12:55:52 -08:00
Benoit Steiner
22de74aa76 Disable use of recurrence for computing twiddle factors. 2018-01-09 18:32:52 +00:00
Gael Guennebaud
73629f8b68 Fix gcc7 warning 2018-01-09 08:59:27 +01:00
RJ Ryan
59985cfd26 Disable use of recurrence for computing twiddle factors. Fixes FFT precision issues for large FFTs. https://github.com/tensorflow/tensorflow/issues/10749#issuecomment-354557689 2017-12-31 10:44:56 -05:00
nluehr
f9bdcea022 For cuda 9.1 replace math_functions.hpp with cuda_runtime.h 2017-12-18 16:51:15 -08:00
Gael Guennebaud
06bf1047f9 Fix compilation of stableNorm with some expressions as input 2017-12-15 15:15:37 +01:00
Gael Guennebaud
73214c4bd0 Workaround nvcc 9.0 issue. See PR 351.
https://bitbucket.org/eigen/eigen/pull-requests/351
2017-12-15 14:10:59 +01:00
Gael Guennebaud
31e0bda2e3 Fix cmake warning 2017-12-14 15:48:27 +01:00
Gael Guennebaud
26a2c6fc16 fix unit test 2017-12-14 15:11:04 +01:00
Gael Guennebaud
546ab97d76 Add possibility to overwrite EIGEN_STRONG_INLINE. 2017-12-14 14:47:38 +01:00
Gael Guennebaud
9c3aed9d48 Fix packet and alignment propagation logic of Block<Xpr> expressions. In particular, (A+B).col(j) lost vectorisation. 2017-12-14 14:24:33 +01:00
Gael Guennebaud
76c7dae600 ignore all *build* sub directories 2017-12-14 14:22:14 +01:00
Gael Guennebaud
b2cacd189e fix header inclusion 2017-12-14 10:01:02 +01:00
Yangzihao Wang
3122477c86 Update the padding computation for PADDING_SAME to be consistent with TensorFlow. 2017-12-12 11:15:24 -08:00
Benoit Steiner
393b7c4959 Merged in ncluehr/eigen/float2half-fix (pull request PR-349)
Replace __float2half_rn with __float2half
2017-12-01 00:29:51 +00:00
nluehr
aefd5fd5c4 Replace __float2half_rn with __float2half
The latter provides a consistent definition for CUDA 8.0 and 9.0.
2017-11-28 10:15:46 -08:00
Gael Guennebaud
d0b028e173 clarify Pastix requirements 2017-11-27 22:11:57 +01:00
Gael Guennebaud
3587e481fb silent MSVC warning 2017-11-27 21:53:02 +01:00
Benoit Steiner
3a327cd3c7 Merged in ncluehr/eigen/predux_fp16_fix (pull request PR-348)
Fix incorrect integer cast in half2 predux.
2017-11-21 21:11:45 +00:00
nluehr
dd6de618c3 Fix incorrect integer cast in predux<half2>().
Bug corrupts results on Maxwell and earlier GPU architectures.
2017-11-21 10:47:00 -08:00
Gael Guennebaud
3dc6ff73ca Handle PGI compiler 2017-11-17 22:54:39 +01:00
Zvi Rackover
599a88da27 Disable gcc-specific workaround for Clang to allow build with AVX512
There is currently a workaround for an issue in gcc that requires invoking gcc with the -fabi-version flag. This workaround is not needed for Clang and moreover is not supported.
2017-11-16 19:53:38 +00:00
Gael Guennebaud
672bdc126b bug #1479: fix failure detection in LDLT 2017-11-16 17:55:24 +01:00
Basil Fierz
624df50945 Adds missing EIGEN_STRONG_INLINE to support MSVC properly inlining small vector calculations
When working with MSVC often small vector operations are not properly inlined. This behaviour is observed even on the most recent compiler versions.
2017-10-26 22:44:28 +02:00
Benoit Steiner
746a6b7b81 Merged in zzp11/eigen/zzp11/a-small-mistake-quickreferencedox-edited-1510217281963 (pull request PR-346)
a small mistake QuickReference.dox edited online with Bitbucket
2018-03-23 01:02:34 +00:00
Benoit Steiner
d2631ef61d Merged in facaiy/eigen/ENH/exp_support_complex_for_gpu (pull request PR-359)
ENH: exp supports complex type for cuda
2018-03-23 00:59:15 +00:00
Benoit Steiner
8fcbd6d4c9 Merged in dtrebbien/eigen (pull request PR-369)
Move up the specialization of std::numeric_limits
2018-03-23 00:54:58 +00:00
Rasmus Munk Larsen
e900b010c8 Improve robustness of igamma and igammac to bad inputs.
Check for nan inputs and propagate them immediately. Limit the number of internal iterations to 2000 (same number as used by scipy.special.gammainc). This prevents an infinite loop when the function is called with nan or very large arguments.

Original change by mfirgunov@google.com
2018-03-19 09:04:54 -07:00
Gael Guennebaud
f7d17689a5 Add static assertion for fixed sizes Ref<> 2018-03-09 10:11:13 +01:00
Gael Guennebaud
f6be7289d7 Implement better static assertion checking to make sure that the first assertion is a static one and not a runtime one. 2018-03-09 10:00:51 +01:00
Gael Guennebaud
d820ab9edc Add static assertion on selfadjoint-view's UpLo parameter. 2018-03-09 09:33:43 +01:00
Daniel Trebbien
0c57be407d Move up the specialization of std::numeric_limits
This fixes a compilation error seen when building TensorFlow on macOS:
https://github.com/tensorflow/tensorflow/issues/17067
2018-02-18 15:35:45 -08:00
Yan Facai (颜发才)
42a8334668 ENH: exp supports complex type for cuda 2018-01-04 16:01:01 +08:00
zhouzhaoping
912e9965ef a small mistake QuickReference.dox edited online with Bitbucket 2017-11-09 08:49:01 +00:00
Gael Guennebaud
4c03b3511e Fix issue with boost::multiprec in previous commit 2017-11-08 23:28:01 +01:00
Gael Guennebaud
e9d2888e74 Improve debugging tests and output in BDCSVD 2017-11-08 10:26:03 +01:00
Gael Guennebaud
e8468ea91b Fix overflow issues in BDCSVD 2017-11-08 10:24:28 +01:00
Benoit Steiner
3949615176 Merged in JonasMu/eigen (pull request PR-329)
Added an example for a contraction to a scalar value to README.md

Approved-by: Jonas Harsch <jonas.harsch@gmail.com>
2017-10-27 07:27:46 +00:00
Christoph Hertzberg
11ddac57e5 Merged in guillaume_michel/eigen (pull request PR-334)
- Add support for NEON plog PacketMath function
2017-10-23 13:22:22 +00:00
Benoit Steiner
a6d875bac8 Removed unecesasry #include 2017-10-22 08:12:45 -07:00
Benoit Steiner
f16ba2a630 Merged in LaFeuille/eigen-1/LaFeuille/typo-fix-alignmeent-alignment-1505889397887 (pull request PR-335)
Typo fix alignmeent ->alignment
2017-10-21 01:59:55 +00:00
Benoit Steiner
ee6ad21b25 Merged in henryiii/eigen/henryiii/device (pull request PR-343)
Fixing missing inlines on device functions for newer CUDA cards
2017-10-21 01:58:22 +00:00
Henry Schreiner
9bb26eb8f1 Restore __device__ 2017-10-21 00:50:38 +00:00
Henry Schreiner
4245475d22 Fixing missing inlines on device functions for newer CUDA cards 2017-10-20 03:20:13 +00:00
Benoit Steiner
8eb4b9d254 Merged in benoitsteiner/opencl (pull request PR-341) 2017-10-17 16:39:28 +00:00
Rasmus Munk Larsen
2dd63ed395 Merge 2017-10-13 15:58:52 -07:00
Rasmus Munk Larsen
f349507e02 Specialize ThreadPoolDevice::enqueueNotification for the case with no args. As an example this reduces binary size of an TensorFlow demo app for Android by about 2.5%. 2017-10-13 15:58:12 -07:00
Benoit Steiner
688451409d Merged in mehdi_goli/upstr_benoit/ComputeCppNewReleaseFix (pull request PR-16)
Changes required for new ComputeCpp CE version.
2017-10-13 20:56:01 +00:00
Konstantinos Margaritis
0e6e027e91 check both z13 and z14 arches 2017-10-12 15:38:34 -04:00
Konstantinos Margaritis
6c3475f110 remove debugging 2017-10-12 15:34:55 -04:00
Konstantinos Margaritis
df7644aec3 Merged eigen/eigen into default 2017-10-12 22:23:13 +03:00
Konstantinos Margaritis
98e52cc770 rollback 374f750ad4 2017-10-12 15:22:10 -04:00
Konstantinos Margaritis
c4ad358565 explicitly set conjugate mask 2017-10-11 11:05:29 -04:00
Konstantinos Margaritis
380d41fd76 added some extra debugging 2017-10-11 10:40:12 -04:00
Konstantinos Margaritis
d0b7b9d0d3 some Packet2cf pmul fixes 2017-10-11 10:17:22 -04:00
Konstantinos Margaritis
df173f5620 initial pexp() for 32-bit floats, commented out due to vec_cts() 2017-10-11 09:40:49 -04:00
Konstantinos Margaritis
3dcae2a27f initial pexp() for 32-bit floats, commented out due to vec_cts() 2017-10-11 09:40:45 -04:00
Konstantinos Margaritis
c2a2246489 fix predux_mul for z14/float 2017-10-10 13:38:32 -04:00
Konstantinos Margaritis
374f750ad4 eliminate 'enumeral and non-enumeral type in conditional expression' warning 2017-10-09 16:56:30 -04:00
Konstantinos Margaritis
bc30305d29 complete z14 port 2017-10-09 16:55:10 -04:00
Gael Guennebaud
0e85a677e3 bug #1472: fix warning 2017-09-26 10:53:33 +02:00
Gael Guennebaud
8579195169 bug #1468 (1/2) : add missing std:: to memcpy 2017-09-22 09:23:24 +02:00
Gael Guennebaud
f92567fecc Add link to a useful example. 2017-09-20 10:22:23 +02:00
Gael Guennebaud
7ad07fc6f2 Update documentation for aligned_allocator 2017-09-20 10:22:00 +02:00
LaFeuille
7c9b07dc5c Typo fix alignmeent ->alignment 2017-09-20 06:38:39 +00:00
Mehdi Goli
2062ac9958 Changes required for new ComputeCpp CE version. 2017-09-18 18:17:39 +01:00
Christoph Hertzberg
23f8b00bc8 clang provides __has_feature(is_enum) (but not <type_traits>) in C++03 mode 2017-09-14 19:26:03 +02:00
Christoph Hertzberg
0c9ad2f525 std::integral_constant is not C++03 compatible 2017-09-14 19:23:38 +02:00
Rasmus Munk Larsen
1b7294f6fc Fix cut-and-paste error. 2017-09-08 16:35:58 -07:00
Rasmus Munk Larsen
94e2213b38 Avoid undefined behavior in Eigen::TensorCostModel::numThreads.
If the cost is large enough then the thread count can be larger than the maximum
representable int, so just casting it to an int is undefined behavior.

Contributed by phurst@google.com.
2017-09-08 15:49:55 -07:00
Gael Guennebaud
6d42309f13 Fix compilation of Vector::operator()(enum) by treating enums as Index 2017-09-07 14:34:30 +02:00
Benoit Steiner
ea4e65bf41 Fixed compilation with cuda_clang. 2017-09-07 09:13:52 +00:00
Gael Guennebaud
a91918a105 Merged in infinitei/eigen (pull request PR-328)
bug #1464 : Fixes construction of EulerAngles from 3D vector expression.

Approved-by: Tal Hadad <tal_hd@hotmail.com>
Approved-by: Abhijit Kundu <abhijit.kundu@gatech.edu>
2017-09-06 08:42:14 +00:00
Gael Guennebaud
9c353dd145 Add C++11 max_digits10 for half. 2017-09-06 10:22:47 +02:00
Gael Guennebaud
b35d1ce4a5 Implement true compile-time "if" for apply_rotation_in_the_plane. This fixes a compilation issue for vectorized real type with missing vectorization for complexes, e.g. AVX512. 2017-09-06 10:02:49 +02:00
Gael Guennebaud
80142362ac Fix mixing types in sparse matrix products. 2017-09-02 22:50:20 +02:00
Jonas Harsch
810b70ad09 Merged in JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 (pull request PR-1)
Added an example for a contraction to a scalar value
2017-09-01 12:01:39 +00:00
Jonas Harsch
a34fb212cd Close branch JonasMu/added-an-example-for-a-contraction-to-a--1504265366851 2017-09-01 12:01:39 +00:00
Jonas Harsch
a991c80365 Added an example for a contraction to a scalar value, e.g. a double contraction of two second order tensors and how you can get the value of the result. I lost one day to get this doen so I think it will help some guys. I also added Eigen:: to the IndexPair and and array in the same example. 2017-09-01 11:30:26 +00:00
Benoit Steiner
a4089991eb Added support for CUDA 9.0. 2017-08-31 02:49:39 +00:00
Abhijit Kundu
6d991a9595 bug #1464 : Fixes construction of EulerAngles from 3D vector expression. 2017-08-30 13:26:30 -04:00
Gael Guennebaud
304ef29571 Handle min/max/inf/etc issue in cuda_fp16.h directly in test/main.h 2017-08-24 11:26:41 +02:00
Konstantinos Margaritis
1affe3d8df Merged eigen/eigen into default 2017-08-24 12:24:01 +03:00
Gael Guennebaud
21633e585b bug #1462: remove all occurences of the deprecated __CUDACC_VER__ macro by introducing EIGEN_CUDACC_VER 2017-08-24 11:06:47 +02:00
Gael Guennebaud
12249849b5 Make the threshold from gemm to coeff-based-product configurable, and add some explanations. 2017-08-24 10:43:21 +02:00
Gael Guennebaud
39864ebe1e bug #336: improve doc for PlainObjectBase::Map 2017-08-22 17:18:43 +02:00
Gael Guennebaud
600e52fc7f Add missing scalar conversion 2017-08-22 17:06:57 +02:00
Gael Guennebaud
9deee79922 bug #1457: add setUnit() methods for consistency. 2017-08-22 16:48:07 +02:00
Gael Guennebaud
bc4dae9aeb bug #1449: fix redux_3 unit test 2017-08-22 15:59:08 +02:00
Gael Guennebaud
bc91a2df8b bug #1461: fix compilation of Map<const Quaternion>::x() 2017-08-22 15:10:42 +02:00
Gael Guennebaud
fc39d5954b Merged in dtrebbien/eigen/patch-1 (pull request PR-312)
Work around a compilation error seen with nvcc V8.0.61
2017-08-22 12:17:37 +00:00
Gael Guennebaud
b223918ea9 Doc: warn about constness in LLT::solveInPlace 2017-08-22 14:12:47 +02:00
Konstantinos Margaritis
4ce5ec5197 initial support for z14 2017-08-07 05:54:29 -04:00
Konstantinos Margaritis
e1e71ca4e4 initial support for z14 2017-08-06 19:53:18 -04:00
Benoit Steiner
84d7be103a Fixing Argmax that was breaking upstream TensorFlow. 2017-07-22 03:19:34 +00:00
Benoit Steiner
f0b154a4b0 Code cleanup 2017-07-10 09:54:09 -07:00
Benoit Steiner
575cda76b3 Fixed syntax errors generated by xcode 2017-07-09 11:39:01 -07:00
Benoit Steiner
5ac27d5b51 Avoid relying on cxx11 features when possible. 2017-07-08 21:58:44 -07:00
Benoit Steiner
c5a241ab9b Merged in benoitsteiner/opencl (pull request PR-323)
Improved support for OpenCL
2017-07-07 16:27:33 +00:00
Benoit Steiner
b7ae4dd9ef Merged in hughperkins/eigen/add-endif-labels-TensorReductionCuda.h (pull request PR-315)
Add labels to #ifdef, in TensorReductionCuda.h
2017-07-07 04:23:52 +00:00
Benoit Steiner
9daed67952 Merged in tntnatbry/eigen (pull request PR-319)
Tensor Trace op
2017-07-07 04:18:03 +00:00
Benoit Steiner
6795512e59 Improved the randomness of the tensor random generator 2017-07-06 21:12:45 -07:00
Benoit Steiner
dc524ac716 Fixed compilation warning 2017-07-06 21:11:15 -07:00
Benoit Steiner
62b4634ebe Merged in mehdi_goli/upstr_benoit/TensorSYCLImageVolumePatchFixed (pull request PR-14)
Applying Benoit's comment for Fixing ImageVolumePatch.

* Applying Benoit's comment for Fixing ImageVolumePatch. Fixing conflict on cmake file.

* Fixing dealocation of the memory in ImagePatch test for SYCL.

* Fixing the automerge issue.
2017-07-06 05:08:13 +00:00
Benoit Steiner
c92faf9d84 Merged in mehdi_goli/upstr_benoit/HiperbolicOP (pull request PR-13)
Adding hyperbolic operations for sycl.

* Adding hyperbolic operations.

* Adding the hyperbolic operations for CPU as well.
2017-07-06 05:05:57 +00:00
Benoit Steiner
53725c10b8 Merged in mehdi_goli/opencl/DataDependancy (pull request PR-10)
DataDependancy

* Wrapping data type to the pointer class for sycl in non-terminal nodes; not having that breaks Tensorflow Conv2d code.

* Applying Ronnan's Comments.

* Applying benoit's comments
2017-06-28 17:55:23 +00:00
Gael Guennebaud
c010b17360 Fix warning 2017-06-27 14:29:57 +02:00
Gael Guennebaud
561f777075 Fix a gcc7 warning about bool * bool in abs2 default implementation. 2017-06-27 12:05:17 +02:00
Gael Guennebaud
b651ce0ffa Fix a gcc7 warning: Wint-in-bool-context 2017-06-26 09:58:28 +02:00
Christoph Hertzberg
157040d44f Make sure CMAKE_Fortran_COMPILER is set before checking for Fortran functions 2017-06-20 16:58:03 +02:00
Gael Guennebaud
24fe1de9b4 merge 2017-06-15 10:17:39 +02:00
Gael Guennebaud
b240080e64 bug #1436: fix compilation of Jacobi rotations with ARM NEON, some specializations of internal::conj_helper were missing. 2017-06-15 10:16:30 +02:00
Benoit Steiner
3baef62b9a Added missing __device__ qualifier 2017-06-13 12:56:55 -07:00
Benoit Steiner
449936828c Added missing __device__ qualifier 2017-06-13 12:54:57 -07:00
Benoit Steiner
b8e805497e Merged in benoitsteiner/opencl (pull request PR-318)
Improved support for OpenCL
2017-06-13 05:01:10 +00:00
Gael Guennebaud
9fbdf02059 Enable Array(EigenBase<>) ctor for compatible scalar types only. This prevents nested arrays to look as being convertible from/to simple arrays. 2017-06-12 22:30:32 +02:00
Gael Guennebaud
e43d8fe9d7 Fix compilation of streaming nested Array, i.e., cout << Array<Array<>> 2017-06-12 22:26:26 +02:00
Gael Guennebaud
d9d7bd6d62 Fix 1x1 case in Solve expression with EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION==RowMajor 2017-06-12 22:25:02 +02:00
Androbin42
95ecb2b5d6 Make buildtests.in more robust 2017-06-12 17:11:06 +00:00
Androbin42
3f7fb5a6d6 Make eigen_monitor_perf.sh more robust 2017-06-12 17:07:56 +00:00
Gael Guennebaud
7f42a93349 Merged in alainvaucher/eigen/find-module-imported-target (pull request PR-324)
In the CMake find module, define the Eigen imported target as when installing with CMake

* In the CMake find module, define the Eigen imported target

* Add quotes to the imported location, in case there are spaces in the path.

Approved-by: Alain Vaucher <acvaucher@gmail.com>
2017-11-15 20:45:09 +00:00
Gael Guennebaud
7cc503f9f5 bug #1485: fix linking issue of non template functions 2017-11-15 21:33:37 +01:00
Gael Guennebaud
103c0aa6ad Add KLU in the list of third-party sparse solvers 2017-11-10 14:13:29 +01:00
Gael Guennebaud
00bc67c374 Move KLU support to official 2017-11-10 14:11:22 +01:00
Gael Guennebaud
b82cd93c01 KLU: truely disable unimplemented code, add proper static assertions in solve 2017-11-10 14:09:01 +01:00
Gael Guennebaud
6365f937d6 KLU depends on BTF but not on libSuiteSparse nor Cholmod 2017-11-10 13:58:52 +01:00
Gael Guennebaud
8cf63ccb99 Merged in kylemacfarlan/eigen (pull request PR-337)
Add support for SuiteSparse's KLU routines
2017-11-10 10:43:17 +00:00
Gael Guennebaud
1495b98a8e Merged in spraetor/eigen (pull request PR-305)
Issue with mpreal and std::numeric_limits::digits
2017-11-10 10:28:54 +00:00
Gael Guennebaud
fc45324380 Merged in jkflying/eigen-fix-scaling (pull request PR-302)
Make scaling work with non-square matrices
2017-11-10 10:11:36 +00:00
Gael Guennebaud
d306b96fb7 Merged in carpent/eigen (pull request PR-342)
Use col method for column-major matrix
2017-11-10 10:09:53 +00:00
Gael Guennebaud
1b2dcf9a47 Check that Schur decomposition succeed. 2017-11-10 10:26:09 +01:00
Gael Guennebaud
0a1cc73942 bug #1484: restore deleted line for 128 bits long doubles, and improve dispatching logic. 2017-11-10 10:25:41 +01:00
Gael Guennebaud
f86bb89d39 Add EIGEN_MKL_NO_DIRECT_CALL option 2017-11-09 11:07:45 +01:00
Gael Guennebaud
5fa79f96b8 Patch from Konstantin Arturov to enable MKL's direct call by default 2017-11-09 10:58:38 +01:00
Justin Carpentier
a020d9b134 Use col method for column-major matrix 2017-10-17 21:51:27 +02:00
Kyle Vedder
c0e1d510fd Add support for SuiteSparse's KLU routines 2017-10-04 21:01:23 -05:00
Gael Guennebaud
6dcf966558 Avoid implicit scalar conversion with accuracy loss in pow(scalar,array) 2017-06-12 16:47:22 +02:00
Gael Guennebaud
50e09cca0f fix tipo 2017-06-11 15:30:36 +02:00
Gael Guennebaud
a4fd4233ad Fix compilation with some compilers 2017-06-09 23:02:02 +02:00
Gael Guennebaud
c3e2afce0d Enable MSVC 2010 workaround from MSVC only 2017-06-09 16:25:18 +02:00
Gael Guennebaud
731c8c704d bug #1403: more scalar conversions fixes in BDCSVD 2017-06-09 15:45:49 +02:00
Gael Guennebaud
1bbcf19029 bug #1403: fix implicit scalar type conversion. 2017-06-09 14:44:02 +02:00
Gael Guennebaud
ba5cab576a bug #1405: enable StrictlyLower/StrictlyUpper triangularView as the destination of matrix*matrix products. 2017-06-09 14:38:04 +02:00
Gael Guennebaud
90168c003d bug #1414: doxygen, add EigenBase to CoreModule 2017-06-09 14:01:44 +02:00
Gael Guennebaud
26f552c18d fix compilation of Half in C++98 (issue introduced in previous commit) 2017-06-09 13:36:58 +02:00
Gael Guennebaud
1d59ca2458 Fix compilation with gcc 4.3 and ARM NEON 2017-06-09 13:20:52 +02:00
Gael Guennebaud
fb1ee04087 bug #1410: fix lvalue propagation of Array/Matrix-Wrapper with a const nested expression. 2017-06-09 13:13:03 +02:00
Gael Guennebaud
723a59ac26 add regression test for aliasing in product rewritting 2017-06-09 12:54:40 +02:00
Gael Guennebaud
8640093af1 fix compilation in C++98 2017-06-09 12:45:01 +02:00
Gael Guennebaud
a7be4cd1b1 Fix LeastSquareDiagonalPreconditioner for complexes (issue introduced in previous commit) 2017-06-09 11:57:53 +02:00
Gael Guennebaud
498aa95a8b bug #1424: add numext::abs specialization for unsigned integer types. 2017-06-09 11:53:49 +02:00
Gael Guennebaud
d588822779 Add missing std::numeric_limits specialization for half, and complete NumTraits<half> 2017-06-09 11:51:53 +02:00
Gael Guennebaud
682b2ef17e bug #1423: fix LSCG\'s Jacobi preconditioner for row-major matrices. 2017-06-08 15:06:27 +02:00
Gael Guennebaud
4bbc320468 bug #1435: fix aliasing issue in exressions like: A = C - B*A; 2017-06-08 12:55:25 +02:00
Hugh Perkins
9341f258d4 Add labels to #ifdef, in TensorReductionCuda.h 2017-06-06 15:51:06 +01:00
Benoit Steiner
1e736b9ead Merged in mehdi_goli/opencl/SYCLAlignAllocator (pull request PR-7)
Fixing SYCL alignment issue required by TensorFlow.
2017-05-26 17:23:00 +00:00
Benoit Steiner
9dee55ec33 Merged eigen/eigen into default 2017-05-26 09:01:04 -07:00
Mehdi Goli
0370d3576e Applying Ronnan's comments. 2017-05-26 16:01:48 +01:00
Benoit Steiner
615aff4d6e Merged in a-doumoulakis/opencl (pull request PR-12)
Enable triSYCL with Eigen
2017-05-25 18:18:23 +00:00
a-doumoulakis
c3bd860de8 Modification upon request
- Remove warning suppression
2017-05-25 18:46:18 +01:00
Mehdi Goli
e3f964ed55 Applying Benoit's comment;removing dead code. 2017-05-25 11:17:26 +01:00
Benoit Steiner
df90010cdd Merged in mehdi_goli/opencl/CmakeFixForUbuntu16.04 (pull request PR-11)
CmakeFixForUbuntu16.04
2017-05-24 19:03:57 +00:00
a-doumoulakis
fb853a857a Restore misplaced comment 2017-05-24 17:50:15 +01:00
a-doumoulakis
7a8ba565f8 Merge changed from upstream 2017-05-24 17:45:29 +01:00
Mehdi Goli
daf99daadd Merged in DuncanMcBain/opencl/default (pull request PR-2)
Update FindComputeCpp.cmake with new changes from SDK
2017-05-24 13:59:53 +00:00
Mehdi Goli
9ef5c948ba Fixing Cmake for gcc>=5. 2017-05-24 13:11:16 +01:00
Duncan McBain
0cb3c7c7dd Update FindComputeCpp.cmake with new changes from SDK 2017-05-24 12:24:21 +01:00
Mmanu Chaturvedi
2971503fed Specializing numeric_limits For AutoDiffScalar 2017-05-23 17:12:36 -04:00
Gael Guennebaud
26e8f9171e Fix compilation of matrix log with Map as input 2017-06-07 10:51:23 +02:00
Gael Guennebaud
f2a553fb7b bug #1411: fix usage of alignment information in vectorization of quaternion product and conjugate. 2017-06-07 10:10:30 +02:00
Christoph Hertzberg
e018142604 Make sure CholmodSupport works when included in multiple compilation units (issue was reported on stackoverflow.com) 2017-06-06 19:23:14 +02:00
Gael Guennebaud
8508db52ab bug #1417: make LinSpace compatible with std::complex 2017-06-06 17:25:56 +02:00
Mehdi Goli
9aa7c30163 Merge with Benoit. 2017-05-23 10:51:34 +01:00
Mehdi Goli
b42d775f13 Temporarry branch for synch with upstream 2017-05-23 10:51:14 +01:00
Benoit Steiner
615733381e Merged in mehdi_goli/opencl/FixingCmakeDependency (pull request PR-2)
Fixing Cmake Dependency for SYCL
2017-05-22 17:43:06 +00:00
Benoit Steiner
1500a67c41 Merged in mehdi_goli/opencl/TensorSupportedDevice (pull request PR-6)
Fixing suported device list.
2017-05-22 16:22:21 +00:00
Mehdi Goli
76c0fc1f95 Fixing SYCL alignment issue required by TensorFlow. 2017-05-22 16:49:32 +01:00
Mehdi Goli
2d17128d6f Fixing suported device list. 2017-05-22 16:40:33 +01:00
Mehdi Goli
61d7f3664a Fixing Cmake Dependency for SYCL 2017-05-22 14:58:28 +01:00
a-doumoulakis
a5226ce4f7 Add cmake file FindTriSYCL.cmake 2017-05-17 17:59:30 +01:00
a-doumoulakis
052426b824 Add support for triSYCL
Eigen is now able to use triSYCL with EIGEN_SYCL_TRISYCL and TRISYCL_INCLUDE_DIR options

Fix contraction kernel with correct nd_item dimension
2017-05-05 19:26:27 +01:00
Abhijit Kundu
4343db84d8 updated warning number for nvcc relase 8 (V8.0.61) for the stupid warning message 'calling a __host__ function from a __host__ __device__ function is not allowed'. 2017-05-01 10:36:27 -04:00
Abhijit Kundu
9bc0a35731 Fixed nested angle barckets >> issue when compiling with cuda 8 2017-04-27 03:09:03 -04:00
Gael Guennebaud
891ac03483 Fix dense * sparse-selfadjoint-view product. 2017-04-25 13:58:10 +02:00
RJ Ryan
949a2da38c Use scalar_sum_op and scalar_quotient_op instead of operator+ and operator/ in MeanReducer.
Improves support for std::complex types when compiling for CUDA.

Expands on e2e9cdd169
 and 2bda1b0d93
.
2017-04-14 13:23:35 -07:00
Gael Guennebaud
d9084ac8e1 Improve mixing of complex and real in the vectorized path of apply_rotation_in_the_plane 2017-04-14 11:05:13 +02:00
Gael Guennebaud
f75dfdda7e Fix unwanted Real to Scalar to Real conversions in column-pivoting QR. 2017-04-14 10:34:30 +02:00
Gael Guennebaud
0f83aeb6b2 Improve cmake scripts for Pastix and BLAS detection. 2017-04-14 10:22:12 +02:00
Benoit Steiner
0d08165a7f Merged in benoitsteiner/opencl (pull request PR-309)
OpenCL improvements
2017-04-05 14:28:08 +00:00
Benoit Steiner
068cc09708 Preserve file naming conventions 2017-04-04 10:09:10 -07:00
Benoit Steiner
c302ea7bc4 Deleted empty line of code 2017-04-04 10:05:16 -07:00
Benoit Steiner
a5a0c8fac1 Guard sycl specific code under a EIGEN_USE_SYCL ifdef 2017-04-04 10:03:21 -07:00
Benoit Steiner
a1304b95b7 Code cleanup 2017-04-04 10:00:46 -07:00
Benoit Steiner
66c63826bd Guard the sycl specific code with EIGEN_USE_SYCL 2017-04-04 09:59:09 -07:00
Benoit Steiner
e3e343390a Guard the sycl specific code with a #ifdef EIGEN_USE_SYCL 2017-04-04 09:56:33 -07:00
Benoit Steiner
63840d4666 iGate the sycl specific code under a EIGEN_USE_SYCL define 2017-04-04 09:54:31 -07:00
Benoit Steiner
bc050ea9f0 Fixed compilation error when sycl is enabled. 2017-04-04 09:47:04 -07:00
Gagan Goel
4910630c96 fix typos in the Tensor readme 2017-03-31 20:32:16 -04:00
Benoit Steiner
c1b3d5ecb6 Restored code compatibility with compilers that dont support c++11
Gated more sycl code under #ifdef sycl
2017-03-31 08:31:28 -07:00
Benoit Steiner
e2d5d4e7b3 Restore the old constructors to retain compatibility with non c++11 compilers. 2017-03-31 08:26:13 -07:00
Benoit Steiner
73fcaa319f Gate the sycl specific code under #ifdef sycl 2017-03-31 08:22:25 -07:00
Mehdi Goli
bd64ee8555 Fixing TensorArgMaxSycl.h; Removing warning related to the hardcoded type of dims to be int in Argmax. 2017-03-28 16:50:34 +01:00
Simon Praetorius
511810797e Issue with mpreal and std::numeric_limits, i.e. digits is not a constant. Added a digits() traits in NumTraits with fallback to static constant. Specialization for mpreal added in MPRealSupport. 2017-03-24 17:45:56 +01:00
Luke Iwanski
a91417a7a5 Introduces align allocator for SYCL buffer 2017-03-20 14:48:54 +00:00
Gael Guennebaud
aae19c70ac update has_ReturnType to be more consistent with other has_ helpers 2017-03-17 17:33:15 +01:00
Benoit Steiner
f8a622ef3c Merged eigen/eigen into default 2017-03-15 20:06:19 -07:00
Benoit Steiner
fd7db52f9b Silenced compilation warning 2017-03-15 20:02:39 -07:00
Luke Iwanski
9597d6f6ab Temporary: Disables cxx11_tensor_argmax_sycl test since it is causing zombie thread 2017-03-15 19:28:09 +00:00
Luke Iwanski
c06861d15e Fixes bug in get_sycl_supported_devices() that was reporting unsupported Intel CPU on AMD platform - causing timeouts in that configuration 2017-03-15 19:26:08 +00:00
Benoit Steiner
7f31bb6822 Merged in ilya-biryukov/eigen/fix_clang_cuda_compilation (pull request PR-304)
Fixed compilation with cuda-clang
2017-03-15 16:48:52 +00:00
Gael Guennebaud
89fd0c3881 better check array index before using it 2017-03-15 15:18:03 +01:00
Benoit Jacob
61160a21d2 ARM prefetch fixes: Implement prefetch on ARM64. Do not clobber cc on ARM32. 2017-03-15 06:57:25 -04:00
Benoit Steiner
f0f3591118 Made the reduction code compile with cuda-clang 2017-03-14 14:16:53 -07:00
Mehdi Goli
f499fe9496 Adding synchronisation to convolution kernel for sycl backend. 2017-03-13 09:18:37 +00:00
Rasmus Munk Larsen
bfd7bf9c5b Get rid of Init(). 2017-03-10 08:48:20 -08:00
Rasmus Munk Larsen
d56ab01094 Use C++11 ctor forwarding to simplify code a bit. 2017-03-10 08:30:22 -08:00
Rasmus Munk Larsen
344c2694a6 Make the non-blocking threadpool more flexible and less wasteful of CPU cycles for high-latency use-cases.
* Adds a hint to ThreadPool allowing us to turn off spin waiting. Currently each reader and record yielder op in a graph creates a threadpool with a thread that spins for 1000 iterations through the work stealing loop before yielding. This is wasteful for such ops that process I/O.

* This also changes the number of iterations through the steal loop to be inversely proportional to the number of threads. Since the time of each iteration is proportional to the number of threads, this yields roughly a constant spin time.

* Implement a separate worker loop for the num_threads == 1 case since there is no point in going through the expensive steal loop. Moreover, since Steal() calls PopBack() on the victim queues it might reverse the order in which ops are executed, compared to the order in which they are scheduled, which is usually counter-productive for the types of I/O workloads the single thread pools tend to be used for.

* Store num_threads in a member variable for simplicity and to avoid a data race between the thread creation loop and worker threads calling threads_.size().
2017-03-09 15:41:03 -08:00
Luke Iwanski
1b32a10053 Use name to distinguish name instead of the vendor 2017-03-08 18:26:34 +00:00
Mehdi Goli
aadb7405a7 Fixing typo in sycl Benchmark. 2017-03-08 18:20:06 +00:00
Gael Guennebaud
970ff78294 bug #1401: fix compilation of "cond ? x : -x" with x an AutoDiffScalar 2017-03-08 16:16:53 +01:00
Mehdi Goli
5e9a1e7a7a Adding sycl Benchmarks. 2017-03-08 14:17:48 +00:00
Mehdi Goli
e2e3f78533 Fixing potential race condition on sycl device. 2017-03-07 17:48:15 +00:00
Mehdi Goli
f84963ed95 Adding TensorIndexTuple and TensorTupleReduceOP backend (ArgMax/Min) for sycl; fixing the address space issue for const TensorMap; converting all discard_write to write due to data missmatch. 2017-03-07 14:27:10 +00:00
Gael Guennebaud
e5156e4d25 fix typo 2017-03-07 11:25:58 +01:00
Gael Guennebaud
5694315fbb remove UTF8 symbol 2017-03-07 10:53:47 +01:00
Gael Guennebaud
e958c2baac remove UTF8 symbols 2017-03-07 10:47:40 +01:00
Gael Guennebaud
d967718525 do not include std header within extern C 2017-03-07 10:16:39 +01:00
Gael Guennebaud
659087b622 bug #1400: fix stableNorm with EIGEN_DONT_ALIGN_STATICALLY 2017-03-07 10:02:34 +01:00
Ilya Biryukov
1c03d43a5c Fixed compilation with cuda-clang 2017-03-06 12:01:12 +01:00
Julian Kent
bbe717fa2f Make scaling work with non-square matrices 2017-03-03 12:58:51 +01:00
Benoit Steiner
a71943b9a4 Made the Tensor code compile with clang 3.9 2017-03-02 10:47:29 -08:00
Benoit Steiner
09ae0e6586 Adjusted the EIGEN_DEVICE_FUNC qualifiers to make sure that:
* they're used consistently between the declaration and the definition of a function
  * we avoid calling host only methods from host device methods.
2017-03-01 11:47:47 -08:00
Benoit Steiner
1e2d046651 Silenced a couple of compilation warnings 2017-03-01 10:13:42 -08:00
Benoit Steiner
c1d87ec110 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-03-01 10:08:50 -08:00
Benoit Steiner
3a3f040baa Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 17:06:15 -08:00
Benoit Steiner
7b61944669 Made most of the packet math primitives usable within CUDA kernel when compiling with clang 2017-02-28 17:05:28 -08:00
Benoit Steiner
c92406d613 Silenced clang compilation warning. 2017-02-28 17:03:11 -08:00
Benoit Steiner
857adbbd52 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 16:42:00 -08:00
Benoit Steiner
c36bc2d445 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 14:58:45 -08:00
Benoit Steiner
4a7df114c8 Added missing EIGEN_DEVICE_FUNC 2017-02-28 14:00:15 -08:00
Benoit Steiner
de7b0fdea9 Made the TensorStorage class compile with clang 3.9 2017-02-28 13:52:22 -08:00
Benoit Steiner
765f4cc4b4 Deleted extra: EIGEN_DEVICE_FUNC: the QR and Cholesky code isn't ready to run on GPU yet. 2017-02-28 11:57:00 -08:00
Benoit Steiner
e993c94f07 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:56:45 -08:00
Benoit Steiner
33443ec2b0 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:50:10 -08:00
Benoit Steiner
f3e9c42876 Added missing EIGEN_DEVICE_FUNC qualifiers 2017-02-28 09:46:30 -08:00
Mehdi Goli
8296b87d7b Adding sycl backend for TensorCustomOp; fixing the partial lhs modification issue on sycl when the rhs is TensorContraction, reduction or convolution; Fixing the partial modification for memset when sycl backend is used. 2017-02-28 17:16:14 +00:00
Gael Guennebaud
4e98a7b2f0 bug #1396: add some missing EIGEN_DEVICE_FUNC 2017-02-28 09:47:38 +01:00
Gael Guennebaud
478a9f53be Fix typo. 2017-02-28 09:32:45 +01:00
Benoit Steiner
889c606f8f Added missing EIGEN_DEVICE_FUNC to the SelfCwise binary ops 2017-02-27 17:17:47 -08:00
Benoit Steiner
193939d6aa Added missing EIGEN_DEVICE_FUNC qualifiers to several nullary op methods. 2017-02-27 17:11:47 -08:00
Benoit Steiner
ed4dc9d01a Declared the plset, ploadt_ro, and ploaddup packet primitives as usable within a gpu kernel 2017-02-27 16:57:01 -08:00
Benoit Steiner
b1fc7c9a09 Added missing EIGEN_DEVICE_FUNC qualifiers. 2017-02-27 16:48:30 -08:00
Benoit Steiner
554116bec1 Added EIGEN_DEVICE_FUNC to make the prototype of the EigenBase override match that of DenseBase 2017-02-27 16:45:31 -08:00
Benoit Steiner
34d9fce93b Avoid unecessary float to double conversions. 2017-02-27 16:33:33 -08:00
Benoit Steiner
e0bd6f5738 Merged eigen/eigen into default 2017-02-26 10:02:14 -08:00
Mehdi Goli
2fa2b617a9 Adding TensorVolumePatchOP.h for sycl 2017-02-24 19:16:24 +00:00
Mehdi Goli
0b7875f137 Converting fixed float type into template type for TensorContraction. 2017-02-24 18:13:30 +00:00
Mehdi Goli
89dfd51fae Adding Sycl Backend for TensorGenerator.h. 2017-02-22 16:36:24 +00:00
Gael Guennebaud
5c68ba41a8 typos 2017-02-21 17:10:55 +01:00
Gael Guennebaud
b0f55ef85a merge 2017-02-21 17:04:10 +01:00
Gael Guennebaud
d29e9d7119 Improve documentation of reshaped 2017-02-21 17:03:10 +01:00
Gael Guennebaud
9b6e365018 Fix linking issue. 2017-02-21 16:52:22 +01:00
Gael Guennebaud
3d200257d7 Add support for automatic-size deduction in reshaped, e.g.:
mat.reshaped(4,AutoSize); <-> mat.reshaped(4,mat.size()/4);
2017-02-21 15:57:25 +01:00
Gael Guennebaud
f8179385bd Add missing const version of mat(all). 2017-02-21 13:56:26 +01:00
Gael Guennebaud
1e3aa470fa Fix long to int conversion 2017-02-21 13:56:01 +01:00
Gael Guennebaud
b3fc0007ae Add support for mat(all) as an alias to mat.reshaped(mat.size(),fix<1>); 2017-02-21 13:49:09 +01:00
Mehdi Goli
4f07ac16b0 Reducing the number of warnings. 2017-02-21 10:09:47 +00:00
Gael Guennebaud
76687f385c bug #1394: fix compilation of SelfAdjointEigenSolver<Matrix>(sparse*sparse); 2017-02-20 14:27:26 +01:00
Gael Guennebaud
d8b1f6cebd bug #1380: for Map<> as input of matrix exponential 2017-02-20 14:06:06 +01:00
Gael Guennebaud
6572825703 bug #1395: fix the use of compile-time vectors as inputs of JacobiSVD. 2017-02-20 13:44:37 +01:00
Mehdi Goli
79ebc8f761 Adding Sycl backend for TensorImagePatchOP.h; adding Sycl backend for TensorInflation.h. 2017-02-20 12:11:05 +00:00
Gael Guennebaud
9081c8f6ea Add support for RowOrder reshaped 2017-02-20 11:46:21 +01:00
Gael Guennebaud
a811a04696 Silent warning. 2017-02-20 10:14:21 +01:00
Gael Guennebaud
63798df038 Fix usage of CUDACC_VER 2017-02-20 08:16:36 +01:00
Gael Guennebaud
deefa54a54 Fix tracking of temporaries in unit tests 2017-02-19 10:32:54 +01:00
Gael Guennebaud
f8a55cc062 Fix compilation. 2017-02-18 10:08:13 +01:00
Gael Guennebaud
cbbf88c4d7 Use int32_t instead of int in NEON code. Some platforms with 16 bytes int supports ARM NEON. 2017-02-17 14:39:02 +01:00
Gael Guennebaud
582b5e39bf bug #1393: enable Matrix/Array explicit ctor from types with conversion operators (was ok with 3.2) 2017-02-17 14:10:57 +01:00
Benoit Steiner
cfa0568ef7 Size indices are signed. 2017-02-16 10:13:34 -08:00
Mehdi Goli
91982b91c0 Adding TensorLayoutSwapOp for sycl. 2017-02-15 16:28:12 +00:00
Mehdi Goli
b1e312edd6 Adding TensorPatch.h for sycl backend. 2017-02-15 10:13:01 +00:00
Benoit Steiner
31a25ab226 Merged eigen/eigen into default 2017-02-14 15:36:21 -08:00
Mehdi Goli
0d153ded29 Adding TensorChippingOP for sycl backend; fixing the index value in the verification operation for cxx11_tensorChipping.cpp test 2017-02-13 17:25:12 +00:00
Gael Guennebaud
5937c4ae32 Fall back is_integral to std::is_integral in c++11 2017-02-13 17:14:26 +01:00
Gael Guennebaud
7073430946 Fix overflow and make use of long long in c++11 only. 2017-02-13 17:14:04 +01:00
Jonathan Hseu
3453b00a1e Fix vector indexing with uint64_t 2017-02-11 21:45:32 -08:00
Gael Guennebaud
e7ebe52bfb bug #1391: include IO.h before DenseBase to enable its usage in DenseBase plugins. 2017-02-13 09:46:20 +01:00
Gael Guennebaud
b3750990d5 Workaround some gcc 4.7 warnings 2017-02-11 23:24:06 +01:00
Gael Guennebaud
4b22048cea Fallback Reshaped to MapBase when possible (same storage order and linear access to the nested expression) 2017-02-11 15:32:53 +01:00
Gael Guennebaud
83d6a529c3 Use Eigen::fix<N> to pass compile-time sizes. 2017-02-11 15:31:28 +01:00
Gael Guennebaud
c16ee72b20 bug #1392: fix #include <Eigen/Sparse> with mpl2-only 2017-02-11 10:35:01 +01:00
Gael Guennebaud
e43016367a Forgot to include a file in previous commit 2017-02-11 10:34:18 +01:00
Gael Guennebaud
6486d4fc95 Worakound gcc 4.7 issue in c++11. 2017-02-11 10:29:10 +01:00
Gael Guennebaud
4a4a72951f Fix previous commits: disbale only problematic indexed view methods for old compilers instead of disabling everything.
Tested with gcc 4.7 (c++03) and gcc 4.8 (c++03 & c++11)
2017-02-11 10:28:44 +01:00
Benoit Steiner
fad776492f Merged eigen/eigen into default 2017-02-10 14:27:43 -08:00
Benoit Steiner
1ef30b8090 Fixed bug introduced in previous commit 2017-02-10 13:35:10 -08:00
Benoit Steiner
769208a17f Pulled latest updates from upstream 2017-02-10 13:11:40 -08:00
Benoit Steiner
8b3cc54c42 Added a new EIGEN_HAS_INDEXED_VIEW define that set to 0 for older compilers that are known to fail to compile the indexed views (I used the define from the indexed_views.cpp test).
Only include the indexed view methods when the compiler supports the code.
This makes it possible to use Eigen again in complex code bases such as TensorFlow and older compilers such as gcc 4.8
2017-02-10 13:08:49 -08:00
Gael Guennebaud
a1ff24f96a Fix prunning in (sparse*sparse).pruned() when the result is nearly dense. 2017-02-10 13:59:32 +01:00
Gael Guennebaud
0256c52359 Include clang in the list of non strict MSVC (just to be sure) 2017-02-10 13:41:52 +01:00
Alexander Neumann
dd58462e63 fixed inlining issue with clang-cl on visual studio
(grafted from 7962ac1a58
)
2017-02-08 23:50:38 +01:00
Gael Guennebaud
fc8fd5fd24 Improve multi-threading heuristic for matrix products with a small number of columns. 2017-02-07 17:19:59 +01:00
Mehdi Goli
0ee97b60c2 Adding mean to TensorReductionSycl.h 2017-02-07 15:43:17 +00:00
Mehdi Goli
42bd5c4e7b Fixing TensorReductionSycl for min and max. 2017-02-06 18:05:23 +00:00
Gael Guennebaud
4254b3eda3 bug #1389: MSVC's std containers do not properly align in 64 bits mode if the requested alignment is larger than 16 bytes (e.g., with AVX) 2017-02-03 15:22:35 +01:00
Mehdi Goli
bc128f9f3b Reducing the warnings in Sycl backend. 2017-02-02 10:43:47 +00:00
Benoit Steiner
442e9cbb30 Silenced several compilation warnings 2017-02-01 15:50:58 -08:00
Benoit Steiner
2db75c07a6 fixed the ordering of the template and EIGEN_DEVICE_FUNC keywords in a few more places to get more of the Eigen codebase to compile with nvcc again. 2017-02-01 15:41:29 -08:00
Benoit Steiner
fcd257039b Replaced EIGEN_DEVICE_FUNC template<foo> with template<foo> EIGEN_DEVICE_FUNC to make the code compile with nvcc8. 2017-02-01 15:30:49 -08:00
Gael Guennebaud
84090027c4 Disable a part of the unit test for gcc 4.8 2017-02-01 23:37:44 +01:00
Gael Guennebaud
0eceea4efd Define EIGEN_COMP_GNUC to reflect version number: 47, 48, 49, 50, 60, ... 2017-02-01 23:36:40 +01:00
Mehdi Goli
ff53050034 Converting ptrdiff_t type to int64_t type in cxx11_tensor_contract_sycl.cpp in order to be the same as other tests. 2017-02-01 15:36:03 +00:00
Mehdi Goli
bab29936a1 Reducing warnings in Sycl backend. 2017-02-01 15:29:53 +00:00
Gael Guennebaud
645a8e32a5 Fix compilation of JacobiSVD for vectors type 2017-01-31 16:22:54 +01:00
Mehdi Goli
48a20b7d95 Fixing compiler error on TensorContractionSycl.h; Silencing the compiler unused parameter warning for eval_op_indices in TensorContraction.h 2017-01-31 14:06:36 +00:00
Gael Guennebaud
53026d29d4 bug #478: fix regression in the eigen decomposition of zero matrices. 2017-01-31 14:22:42 +01:00
Benoit Steiner
fbc39fd02c Merge latest changes from upstream 2017-01-30 15:25:57 -08:00
Gael Guennebaud
63de19c000 bug #1380: fix matrix exponential with Map<> 2017-01-30 13:55:27 +01:00
Gael Guennebaud
c86911ac73 bug #1384: fix evaluation of "sparse/scalar" that used the wrong evaluation path. 2017-01-30 13:38:24 +01:00
Mehdi Goli
82ce92419e Fixing the buffer type in memcpy. 2017-01-30 11:38:20 +00:00
Gael Guennebaud
24409f3acd Use fix<> API to specify compile-time reshaped sizes. 2017-01-29 15:20:35 +01:00
Gael Guennebaud
9036cda364 Cleanup intitial reshape implementation:
- reshape -> reshaped
 - make it compatible with evaluators.
2017-01-29 14:57:45 +01:00
Gael Guennebaud
0e89baa5d8 import yoco xiao's work on reshape 2017-01-29 14:29:31 +01:00
Gael Guennebaud
d024e9942d MSVC 1900 release is not c++14 compatible enough for us. The 1910 update seems to be fine though. 2017-01-27 22:17:59 +01:00
Gael Guennebaud
83592659ba merge 2017-01-27 21:59:59 +01:00
Gael Guennebaud
4a351be163 Fix warning 2017-01-27 11:59:35 +01:00
Gael Guennebaud
251ad3e04f Fix unamed type as template parametre issue. 2017-01-27 11:57:52 +01:00
Rasmus Munk Larsen
edaa0fc5d1 Revert PR-292. After further investigation, the memcpy->memmove change was only good for Haswell on older versions of glibc. Adding a switch for small sizes is perhaps useful for string copies, but also has an overhead for larger sizes, making it a poor trade-off for general memcpy.
This PR also removes a couple of unnecessary semi-colons in Eigen/src/Core/AssignEvaluator.h that caused compiler warning everywhere.
2017-01-26 12:46:06 -08:00
Gael Guennebaud
25a1703579 Merged in ggael/eigen-flexidexing (pull request PR-294)
generalized operator() for indexed access and slicing
2017-01-26 08:04:23 +00:00
Gael Guennebaud
98dfe0c13f Fix useless ';' warning 2017-01-25 22:55:04 +01:00
Gael Guennebaud
28351073d8 Fix unamed type as template argument (ok in c++11 only) 2017-01-25 22:54:51 +01:00
Gael Guennebaud
607be65a03 Fix duplicates of array_size bewteen unsupported and Core 2017-01-25 22:53:58 +01:00
Rasmus Munk Larsen
7d39c6d50a Merged eigen/eigen into default 2017-01-25 09:22:26 -08:00
Rasmus Munk Larsen
5c9ed4ba0d Reverse arguments for pmin in AVX. 2017-01-25 09:21:57 -08:00
Gael Guennebaud
850ca961d2 bug #1383: fix regression in LinSpaced for integers and high<low 2017-01-25 18:13:53 +01:00
Gael Guennebaud
296d24be4d bug #1381: fix sparse.diagonal() used as a rvalue.
The problem was that is "sparse" is not const, then sparse.diagonal() must have the
LValueBit flag meaning that sparse.diagonal().coeff(i) must returns a const reference,
const Scalar&. However, sparse::coeff() cannot returns a reference for a non-existing
zero coefficient. The trick is to return a reference to a local member of
evaluator<SparseMatrix>.
2017-01-25 17:39:01 +01:00
Gael Guennebaud
d06a48959a bug #1383: Fix regression from 3.2 with LinSpaced(n,0,n-1) with n==0. 2017-01-25 15:27:13 +01:00
Rasmus Munk Larsen
ae3e43a125 Remove extra space. 2017-01-24 16:16:39 -08:00
Benoit Steiner
e96c77668d Merged in rmlarsen/eigen2 (pull request PR-292)
Adds a fast memcpy function to Eigen.
2017-01-25 00:14:04 +00:00
Rasmus Munk Larsen
3be5ee2352 Update copy helper to use fast_memcpy. 2017-01-24 14:22:49 -08:00
Rasmus Munk Larsen
e6b1020221 Adds a fast memcpy function to Eigen. This takes advantage of the following:
1. For small fixed sizes, the compiler generates inline code for memcpy, which is much faster.

2. My colleague eriche at googl dot com discovered that for large sizes, memmove is significantly faster than memcpy (at least on Linux with GCC or Clang). See benchmark numbers measured on a Haswell (HP Z440) workstation here: https://docs.google.com/a/google.com/spreadsheets/d/1jLs5bKzXwhpTySw65MhG1pZpsIwkszZqQTjwrd_n0ic/pubhtml This is of course surprising since memcpy is a less constrained version of memmove. This stackoverflow thread contains some speculation as to the causes: http://stackoverflow.com/questions/22793669/poor-memcpy-performance-on-linux

Below are numbers for copying and slicing tensors using the multithreaded TensorDevice. The numbers show significant improvements for memcpy of very small blocks and for memcpy of large blocks single threaded (we were already able to saturate memory bandwidth for >1 threads before on large blocks). The "slicingSmallPieces" benchmark also shows small consistent improvements, since memcpy cost is a fair portion of that particular computation.

The benchmarks operate on NxN matrices, and the names are of the form BM_$OP_${NUMTHREADS}T/${N}.

Measured improvements in wall clock time:

Run on rmlarsen3.mtv (12 X 3501 MHz CPUs); 2017-01-20T11:26:31.493023454-08:00
CPU: Intel Haswell with HyperThreading (6 cores) dL1:32KB dL2:256KB dL3:15MB
Benchmark                          Base (ns)  New (ns) Improvement
------------------------------------------------------------------
BM_memcpy_1T/2                          3.48      2.39    +31.3%
BM_memcpy_1T/8                          12.3      6.51    +47.0%
BM_memcpy_1T/64                          371       383     -3.2%
BM_memcpy_1T/512                       66922     66720     +0.3%
BM_memcpy_1T/4k                      9892867   6849682    +30.8%
BM_memcpy_1T/5k                     14951099  10332856    +30.9%
BM_memcpy_2T/2                          3.50      2.46    +29.7%
BM_memcpy_2T/8                          12.3      7.66    +37.7%
BM_memcpy_2T/64                          371       376     -1.3%
BM_memcpy_2T/512                       66652     66788     -0.2%
BM_memcpy_2T/4k                      6145012   6117776     +0.4%
BM_memcpy_2T/5k                      9181478   9010942     +1.9%
BM_memcpy_4T/2                          3.47      2.47    +31.0%
BM_memcpy_4T/8                          12.3      6.67    +45.8
BM_memcpy_4T/64                          374       376     -0.5%
BM_memcpy_4T/512                       67833     68019     -0.3%
BM_memcpy_4T/4k                      5057425   5188253     -2.6%
BM_memcpy_4T/5k                      7555638   7779468     -3.0%
BM_memcpy_6T/2                          3.51      2.50    +28.8%
BM_memcpy_6T/8                          12.3      7.61    +38.1%
BM_memcpy_6T/64                          373       378     -1.3%
BM_memcpy_6T/512                       66871     66774     +0.1%
BM_memcpy_6T/4k                      5112975   5233502     -2.4%
BM_memcpy_6T/5k                      7614180   7772246     -2.1%
BM_memcpy_8T/2                          3.47      2.41    +30.5%
BM_memcpy_8T/8                          12.4      10.5    +15.3%
BM_memcpy_8T/64                          372       388     -4.3%
BM_memcpy_8T/512                       67373     66588     +1.2%
BM_memcpy_8T/4k                      5148462   5254897     -2.1%
BM_memcpy_8T/5k                      7660989   7799058     -1.8%
BM_memcpy_12T/2                         3.50      2.40    +31.4%
BM_memcpy_12T/8                         12.4      7.55    +39.1
BM_memcpy_12T/64                         374       378     -1.1%
BM_memcpy_12T/512                      67132     66683     +0.7%
BM_memcpy_12T/4k                     5185125   5292920     -2.1%
BM_memcpy_12T/5k                     7717284   7942684     -2.9%
BM_slicingSmallPieces_1T/2              47.3      47.5     +0.4%
BM_slicingSmallPieces_1T/8              53.6      52.3     +2.4%
BM_slicingSmallPieces_1T/64              491       476     +3.1%
BM_slicingSmallPieces_1T/512           21734     18814    +13.4%
BM_slicingSmallPieces_1T/4k           394660    396760     -0.5%
BM_slicingSmallPieces_1T/5k           218722    209244     +4.3%
BM_slicingSmallPieces_2T/2              80.7      79.9     +1.0%
BM_slicingSmallPieces_2T/8              54.2      53.1     +2.0
BM_slicingSmallPieces_2T/64              497       477     +4.0%
BM_slicingSmallPieces_2T/512           21732     18822    +13.4%
BM_slicingSmallPieces_2T/4k           392885    390490     +0.6%
BM_slicingSmallPieces_2T/5k           221988    208678     +6.0%
BM_slicingSmallPieces_4T/2              80.8      80.1     +0.9%
BM_slicingSmallPieces_4T/8              54.1      53.2     +1.7%
BM_slicingSmallPieces_4T/64              493       476     +3.4%
BM_slicingSmallPieces_4T/512           21702     18758    +13.6%
BM_slicingSmallPieces_4T/4k           393962    404023     -2.6%
BM_slicingSmallPieces_4T/5k           249667    211732    +15.2%
BM_slicingSmallPieces_6T/2              80.5      80.1     +0.5%
BM_slicingSmallPieces_6T/8              54.4      53.4     +1.8%
BM_slicingSmallPieces_6T/64              488       478     +2.0%
BM_slicingSmallPieces_6T/512           21719     18841    +13.3%
BM_slicingSmallPieces_6T/4k           394950    397583     -0.7%
BM_slicingSmallPieces_6T/5k           223080    210148     +5.8%
BM_slicingSmallPieces_8T/2              81.2      80.4     +1.0%
BM_slicingSmallPieces_8T/8              58.1      53.5     +7.9%
BM_slicingSmallPieces_8T/64              489       480     +1.8%
BM_slicingSmallPieces_8T/512           21586     18798    +12.9%
BM_slicingSmallPieces_8T/4k           394592    400165     -1.4%
BM_slicingSmallPieces_8T/5k           219688    208301     +5.2%
BM_slicingSmallPieces_12T/2             80.2      79.8     +0.7%
BM_slicingSmallPieces_12T/8             54.4      53.4     +1.8
BM_slicingSmallPieces_12T/64             488       476     +2.5%
BM_slicingSmallPieces_12T/512          21931     18831    +14.1%
BM_slicingSmallPieces_12T/4k          393962    396541     -0.7%
BM_slicingSmallPieces_12T/5k          218803    207965     +5.0%
2017-01-24 13:55:18 -08:00
Rasmus Munk Larsen
7b6aaa3440 Fix NaN propagation for AVX512. 2017-01-24 13:37:08 -08:00
Rasmus Munk Larsen
5e144bbaa4 Make NaN propagatation consistent between the pmax/pmin and std::max/std::min. This makes the NaN propagation consistent between the scalar and vectorized code paths of Eigen's scalar_max_op and scalar_min_op.
See #1373 for details.
2017-01-24 13:32:50 -08:00
Gael Guennebaud
d83db761a2 Add support for std::integral_constant 2017-01-24 16:28:12 +01:00
Gael Guennebaud
bc10201854 Add test for multiple symbols 2017-01-24 16:27:51 +01:00
Gael Guennebaud
c43d254d13 Fix seq().reverse() in c++98 2017-01-24 11:36:43 +01:00
Gael Guennebaud
5783158e8f Add unit test for FixedInt and Symbolic 2017-01-24 10:55:12 +01:00
Gael Guennebaud
ddd83f82d8 Add support for "SymbolicExpr op fix<N>" in C++98/11 mode. 2017-01-24 10:54:42 +01:00
Gael Guennebaud
228fef1b3a Extended the set of arithmetic operators supported by FixedInt (-,+,*,/,%,&,|) 2017-01-24 10:53:51 +01:00
Gael Guennebaud
bb52f74e62 Add internal doc 2017-01-24 10:13:35 +01:00
Gael Guennebaud
41c523a0ab Rename fix_t to FixedInt 2017-01-24 09:39:49 +01:00
Gael Guennebaud
156e6234f1 bug #1375: fix cmake installation with cmake 2.8 2017-01-24 09:16:40 +01:00
Gael Guennebaud
ba3f977946 bug #1376: add missing assertion on size mismatch with compound assignment operators (e.g., mat += mat.col(j)) 2017-01-23 22:06:08 +01:00
Gael Guennebaud
b0db4eff36 bug #1382: move using std::size_t/ptrdiff_t to Eigen's namespace (still better than the global namespace!) 2017-01-23 22:03:57 +01:00
Gael Guennebaud
ca79c1545a Add std:: namespace prefix to all (hopefully) instances if size_t/ptrdfiff_t 2017-01-23 22:02:53 +01:00
Gael Guennebaud
4b607b5692 Use Index instead of size_t 2017-01-23 22:00:33 +01:00
Luke Iwanski
bf44fed9b7 Allows AMD APU 2017-01-23 15:56:45 +00:00
Gael Guennebaud
0fe278f7be bug #1379: fix compilation in sparse*diagonal*dense with openmp 2017-01-21 23:27:01 +01:00
Gael Guennebaud
22a172751e bug #1378: fix doc (DiagonalIndex vs Diagonal) 2017-01-21 22:09:59 +01:00
Mehdi Goli
602f8c27f5 Reverting back to the previous TensorDeviceSycl.h as the total number of buffer is not enough for tensorflow. 2017-01-20 18:23:20 +00:00
Gael Guennebaud
4d302a080c Recover compile-time size from seq(A,B) when A and B are fixed values. (c++11 only) 2017-01-19 20:34:18 +01:00
Gael Guennebaud
54f3fbee24 Exploit fixed values in seq and reverse with C++98 compatibility 2017-01-19 19:57:32 +01:00
Gael Guennebaud
7691723e34 Add support for fixed-value in symbolic expression, c++11 only for now. 2017-01-19 19:25:29 +01:00
Benoit Steiner
924600a0e8 Made sure that enabling avx2 instructions enables avx and sse instructions as well. 2017-01-19 09:54:48 -08:00
Mehdi Goli
77cc4d06c7 Removing unused variables 2017-01-19 17:06:21 +00:00
Mehdi Goli
837fdbdcb2 Merging with Benoit's upstream. 2017-01-19 11:34:34 +00:00
Mehdi Goli
6bdd15f572 Adding non-deferrenciable pointer track for ComputeCpp backend; Adding TensorConvolutionOp for ComputeCpp; fixing typos. modifying TensorDeviceSycl to use the LegacyPointer class. 2017-01-19 11:30:59 +00:00
Benoit Steiner
aa7fb88dfa Merged in LaFeuille/eigen (pull request PR-289)
Fix a typo
2017-01-18 16:44:39 -08:00
Gael Guennebaud
e84ed7b6ef Remove dead code 2017-01-18 23:18:28 +01:00
Gael Guennebaud
f3ccbe0419 Add a Symbolic::FixedExpr helper expression to make sure the compiler fully optimize the usage of last and end. 2017-01-18 23:16:32 +01:00
Mehdi Goli
c6f7b33834 Applying Benoit's comment. Embedding synchronisation inside device memcpy so there is no need to externally call synchronise() for device memcopy. 2017-01-18 10:45:28 +00:00
Gael Guennebaud
15471432fe Add a .reverse() member to ArithmeticSequence. 2017-01-18 11:35:27 +01:00
Gael Guennebaud
e4f8dd860a Add missing operator* 2017-01-18 10:49:01 +01:00
Gael Guennebaud
198507141b Update all block expressions to accept compile-time sizes passed by fix<N> or fix<N>(n) 2017-01-18 09:43:58 +01:00
Gael Guennebaud
5484ddd353 Merge the generic and dynamic overloads of block() 2017-01-17 22:11:46 +01:00
Gael Guennebaud
655ba783f8 Defer set-to-zero in triangular = product so that no aliasing issue occur in the common:
A.triangularView() = B*A.sefladjointView()*B.adjoint()
case that used to work in 3.2.
2017-01-17 18:03:35 +01:00
Gael Guennebaud
5e36ec3b6f Fix regression when passing enums to operator() 2017-01-17 17:10:16 +01:00
Gael Guennebaud
f7852c3d16 Fix -Wunnamed-type-template-args 2017-01-17 16:05:58 +01:00
Gael Guennebaud
4f36dcfda8 Add a generic block() method compatible with Eigen::fix 2017-01-17 11:34:28 +01:00
Gael Guennebaud
71e5b71356 Add a get_runtime_value helper to deal with pointer-to-function hack,
plus some refactoring to make the internals more consistent.
2017-01-17 11:33:57 +01:00
Gael Guennebaud
59801a3250 Add \newin{3.x} doxygen command 2017-01-17 10:31:28 +01:00
Gael Guennebaud
23bfcfc15f Add missing overload of get_compile_time for c++98/11 2017-01-17 10:30:21 +01:00
Gael Guennebaud
edff32c2c2 Disambiguate the two versions of fix for doxygen 2017-01-17 10:29:33 +01:00
Gael Guennebaud
4989922be2 Add support for symbolic expressions as arguments of operator() 2017-01-16 22:21:23 +01:00
Gael Guennebaud
12e22a2844 typos in doc 2017-01-16 16:31:19 +01:00
Gael Guennebaud
e70c4c97fa Typo 2017-01-16 16:20:16 +01:00
Gael Guennebaud
a9232af845 Introduce a variable_or_fixed<N> proxy returned by fix<N>(val) to pass both a compile-time and runtime fallback value in case N means "runtime".
This mechanism is used by the seq/seqN functions. The proxy object is immediately converted to pure compile-time (as fix<N>) or pure runtime (i.e., an Index) to avoid redundant template instantiations.
2017-01-16 16:17:01 +01:00
Gael Guennebaud
6e97698161 Introduce a EIGEN_HAS_CXX14 macro 2017-01-16 16:13:37 +01:00
Mehdi Goli
e46e722381 Adding Tensor ReverseOp; TensorStriding; TensorConversionOp; Modifying Tensor Contractsycl to be located in any place in the expression tree. 2017-01-16 13:58:49 +00:00
Luke Iwanski
23778a15d8 Reverting unintentional change to Eigen/Geometry 2017-01-16 11:05:56 +00:00
LaFeuille
1b19b80c06 Fix a typo 2017-01-13 07:24:55 +00:00
Fraser Cormack
8245d3c7ad Fix case-sensitivity of file include 2017-01-12 12:13:18 +00:00
Gael Guennebaud
752bd92ba5 Large code refactoring:
- generalize some utilities and move them to Meta (size(), array_size())
 - move handling of all and single indices to IndexedViewHelper.h
 - several cleanup changes
2017-01-11 17:24:02 +01:00
Gael Guennebaud
f93d1c58e0 Make get_compile_time compatible with variable_if_dynamic 2017-01-11 17:08:59 +01:00
Gael Guennebaud
c020d307a6 Make variable_if_dynamic<T> implicitely convertible to T 2017-01-11 17:08:05 +01:00
Gael Guennebaud
43c617e2ee merge 2017-01-11 14:33:37 +01:00
Gael Guennebaud
152cd57bb7 Enable generation of doc for static variables in Eigen's namespace. 2017-01-11 14:29:20 +01:00
Gael Guennebaud
b1dc0fa813 Move fix and symbolic to their own file, and improve doxygen compatibility 2017-01-11 14:28:28 +01:00
Gael Guennebaud
04397f17e2 Add 1D overloads of operator() 2017-01-11 13:17:09 +01:00
Gael Guennebaud
45199b9773 Fix typo 2017-01-11 09:34:08 +01:00
Gael Guennebaud
1b5570988b Add doc to seq, seqN, ArithmeticSequence, operator(), etc. 2017-01-10 22:58:58 +01:00
Gael Guennebaud
17eac60446 Factorize const and non-const version of the generic operator() method. 2017-01-10 21:45:55 +01:00
Gael Guennebaud
d072fc4b14 add writeable IndexedView 2017-01-10 17:10:35 +01:00
Gael Guennebaud
c9d5e5c6da Simplify Symbolic API: std::tuple is now used internally and automatically built. 2017-01-10 16:55:07 +01:00
Gael Guennebaud
407e7b7a93 Simplify symbolic API by using "symbol=value" to associate a runtime value to a symbol. 2017-01-10 16:45:32 +01:00
Gael Guennebaud
96e6cf9aa2 Fix linking issue. 2017-01-10 16:35:46 +01:00
Gael Guennebaud
e63678bc89 Fix ambiguous call 2017-01-10 16:33:40 +01:00
Gael Guennebaud
8e247744a4 Fix linking issue 2017-01-10 16:32:06 +01:00
Gael Guennebaud
b47a7e5c3a Add doc for IndexedView 2017-01-10 16:28:57 +01:00
Gael Guennebaud
87963f441c Fallback to Block<> when possible (Index, all, seq with > increment).
This is important to take advantage of the optimized implementations (evaluator, products, etc.),
and to support sparse matrices.
2017-01-10 14:25:30 +01:00
Gael Guennebaud
a98c7efb16 Add a more generic evaluation mechanism and minimalistic doc. 2017-01-10 11:46:29 +01:00
Gael Guennebaud
13d954f270 Cleanup Eigen's namespace 2017-01-10 11:06:02 +01:00
Gael Guennebaud
9eaab4f9e0 Refactoring: move all symbolic stuff into its own namespace 2017-01-10 10:57:08 +01:00
Gael Guennebaud
acd08900c9 Move 'last' and 'end' to their own namespace 2017-01-10 10:31:07 +01:00
Gael Guennebaud
1df2377d78 Implement c++98 version of seq() 2017-01-10 10:28:45 +01:00
Gael Guennebaud
ecd9cc5412 Isolate legacy code (we keep it for performance comparison purpose) 2017-01-10 09:34:25 +01:00
Gael Guennebaud
b50c3e967e Add a minimalistic symbolic scalar type with expression template and make use of it to define the last placeholder and to unify the return type of seq and seqN. 2017-01-09 23:42:16 +01:00
Gael Guennebaud
68064e14fa Rename span/range to seqN/seq 2017-01-09 17:35:21 +01:00
Gael Guennebaud
ad3eef7608 Add link to SO 2017-01-09 13:01:39 +01:00
Gael Guennebaud
75aef5b37f Fix extraction of compile-time size of std::array with gcc 2017-01-06 22:04:49 +01:00
Gael Guennebaud
233dff1b35 Add support for plain arrays for columns and both rows/columns 2017-01-06 22:01:53 +01:00
Gael Guennebaud
76e183bd52 Propagate compile-time size for plain arrays 2017-01-06 22:01:23 +01:00
Gael Guennebaud
3264d3c761 Add support for plain-array as indices, e.g., mat({1,2,3,4}) 2017-01-06 21:53:32 +01:00
Gael Guennebaud
831fffe874 Add missing doc of SparseView 2017-01-06 18:01:29 +01:00
Gael Guennebaud
a875167d99 Propagate compile-time increment and strides.
Had to introduce a UndefinedIncr constant for non structured list of indices.
2017-01-06 15:54:55 +01:00
Gael Guennebaud
e383d6159a MSVC 2015 has all we want about c++11 and MSVC 2017 fails on binder1st/binder2nd 2017-01-06 15:44:13 +01:00
Gael Guennebaud
fad1fa75b3 Propagate compile-time size with "all" and add c++11 array unit test 2017-01-06 13:29:33 +01:00
Gael Guennebaud
3730e3ca9e Use "fix" for compile-time values, propagate compile-time sizes for span, clean some cleanup. 2017-01-06 13:10:10 +01:00
Gael Guennebaud
60e99ad8d7 Add unit test for indexed views 2017-01-06 11:59:08 +01:00
Gael Guennebaud
ac7e4ac9c0 Initial commit to add a generic indexed-based view of matrices.
This version already works as a read-only expression.
Numerous refactoring, renaming, extension, tuning passes are expected...
2017-01-06 00:01:44 +01:00
Gael Guennebaud
f3f026c9aa Convert integers to real numbers when computing relative L2 error 2017-01-05 13:36:08 +01:00
Jim Radford
0c226644d8 LLT: const the arg to solveInPlace() to allow passing .transpose(), .block(), etc. 2017-01-04 14:42:57 -08:00
Jim Radford
be281e5289 LLT: avoid making a copy when decomposing in place 2017-01-04 14:43:56 -08:00
Gael Guennebaud
e27f17bf5c Gub 1453: fix Map with non-default inner-stride but no outer-stride. 2017-08-22 13:27:37 +02:00
Gael Guennebaud
21d0a0bcf5 bug #1456: add perf recommendation for LLT and storage format 2017-08-22 12:46:35 +02:00
Gael Guennebaud
2c3d70d915 Re-enable hidden doc in LLT 2017-08-22 12:04:09 +02:00
Gael Guennebaud
a6e7a41a55 bug #1455: Cholesky module depends on Jacobi for rank-updates. 2017-08-22 11:37:32 +02:00
Gael Guennebaud
e6021cc8cc bug #1458: fix documentation of LLT and LDLT info() method. 2017-08-22 11:32:55 +02:00
Gael Guennebaud
2810ba194b Clarify MKL_DIRECT_CALL doc. 2017-08-17 22:12:26 +02:00
Gael Guennebaud
f727844658 use MKL's lapacke.h header when using MKL 2017-08-17 21:58:39 +02:00
Gael Guennebaud
8c858bd891 Clarify doc regarding the usage of MKL_DIRECT_CALL 2017-08-17 12:17:45 +02:00
Gael Guennebaud
b95f92843c Fix support for MKL's BLAS when using MKL_DIRECT_CALL. 2017-08-17 12:07:10 +02:00
Gael Guennebaud
89c01a494a Add unit test for has_ReturnType 2017-08-17 11:55:00 +02:00
Gael Guennebaud
687bedfcad Make NoAlias and JacobiRotation compatible with CUDA. 2017-08-17 11:51:22 +02:00
Gael Guennebaud
1f4b24d2df Do not preallocate more space than the matrix size (when the sparse matrix boils down to a vector 2017-07-20 10:13:48 +02:00
Gael Guennebaud
d580a90c9a Disable BDCSVD preallocation check. 2017-07-20 10:03:54 +02:00
Gael Guennebaud
55d7181557 Fix lazyness of operator* with CUDA 2017-07-20 09:47:28 +02:00
Gael Guennebaud
cda47c42c2 Fix compilation in c++98 mode. 2017-07-17 21:08:20 +02:00
Gael Guennebaud
a74b9ba7cd Update documentation for CUDA 2017-07-17 11:05:26 +02:00
Gael Guennebaud
3182bdbae6 Disable vectorization when compiled by nvcc, even is EIGEN_NO_CUDA is defined 2017-07-17 11:01:28 +02:00
Gael Guennebaud
9f8136ff74 disable nvcc boolean-expr-is-constant warning 2017-07-17 10:43:18 +02:00
Gael Guennebaud
bbd97b4095 Add a EIGEN_NO_CUDA option, and introduce EIGEN_CUDACC and EIGEN_CUDA_ARCH aliases 2017-07-17 01:02:51 +02:00
Gael Guennebaud
2299717fd5 Fix and workaround several doxygen issues/warnings 2017-01-04 23:27:33 +01:00
Luke Iwanski
90c5bc8d64 Fixes auto appearance in functor template argument for reduction. 2017-01-04 22:18:44 +00:00
Gael Guennebaud
ee6f7f6c0c Add doc for sparse triangular solve functions 2017-01-04 23:10:36 +01:00
Gael Guennebaud
5165de97a4 Add missing snippet files. 2017-01-04 23:08:27 +01:00
Gael Guennebaud
a0a36ad0ef bug #1336: workaround doxygen failing to include numerous members of MatriBase in Matrix 2017-01-04 22:02:39 +01:00
Gael Guennebaud
29a1a58113 Document selfadjointView 2017-01-04 22:01:50 +01:00
Gael Guennebaud
a5ebc92f8d bug #1336: fix doxygen issue regarding EIGEN_CWISE_BINARY_RETURN_TYPE 2017-01-04 18:21:44 +01:00
Gael Guennebaud
45b289505c Add debug output 2017-01-03 11:31:02 +01:00
Gael Guennebaud
5838f078a7 Fix inclusion 2017-01-03 11:30:27 +01:00
Gael Guennebaud
8702562177 bug #1370: add doc for StorageIndex 2017-01-03 11:25:41 +01:00
Gael Guennebaud
575c078759 bug #1370: rename _Index to _StorageIndex in SparseMatrix, and add a warning in the doc regarding the 3.2 to 3.3 change of SparseMatrix::Index 2017-01-03 11:19:14 +01:00
NeroBurner
c4fc2611ba add cmake-option to enable/disable creation of tests
* * *
disable unsupportet/test when test are disabled
* * *
rename EIGEN_ENABLE_TESTS to BUILD_TESTING
* * *
consider BUILD_TESTING in blas
2017-01-02 09:09:21 +01:00
Valentin Roussellet
d3c5525c23 Added += and + operators to inner iterators
Fix #1340
#1340
2016-12-28 18:29:30 +01:00
Gael Guennebaud
5c27962453 Move common cwise-unary method from MatrixBase/ArrayBase to the common DenseBase class. 2017-01-02 22:27:07 +01:00
Marco Falke
4ebf69394d doc: Fix trivial typo in AsciiQuickReference.txt
* * *
fixup!
2017-01-01 13:25:48 +00:00
Gael Guennebaud
8d7810a476 bug #1365: fix another type mismatch warning
(sync is set from and compared to an Index)
2016-12-28 23:35:43 +01:00
Gael Guennebaud
97812ff0d3 bug #1369: fix type mismatch warning.
Returned values of omp thread id and numbers are int,
o let's use int instead of Index here.
2016-12-28 23:29:35 +01:00
Gael Guennebaud
7713e20fd2 Fix compilation 2016-12-27 22:04:58 +01:00
Gael Guennebaud
ab69a7f6d1 Cleanup because trait<CwiseBinaryOp>::Flags now expose the correct storage order 2016-12-27 16:55:47 +01:00
Gael Guennebaud
d32a43e33a Make sure that traits<CwiseBinaryOp>::Flags reports the correct storage order so that methods like .outerSize()/.innerSize() work properly. 2016-12-27 16:35:45 +01:00
Gael Guennebaud
7136267461 Add missing .outer() member to iterators of evaluators of cwise sparse binary expression 2016-12-27 16:34:30 +01:00
Gael Guennebaud
fe0ee72390 Fix check of storage order mismatch for "sparse cwiseop sparse". 2016-12-27 16:33:19 +01:00
Gael Guennebaud
6b8f637ab1 Harmless typo 2016-12-27 16:31:17 +01:00
Benoit Steiner
3eda02d78d Fixed the sycl benchmarking code 2016-12-22 10:37:05 -08:00
Mehdi Goli
8b1c2108ba Reverting asynchronous exec to Synchronous exec regarding random race condition. 2016-12-22 16:45:38 +00:00
Benoit Steiner
354baa0fb1 Avoid using horizontal adds since they're not very efficient. 2016-12-21 20:55:07 -08:00
Benoit Steiner
d7825b6707 Use native AVX512 types instead of Eigen Packets whenever possible. 2016-12-21 20:06:18 -08:00
Benoit Steiner
660da83e18 Pulled latest update from trunk 2016-12-21 16:43:27 -08:00
Benoit Steiner
4236aebe10 Simplified the contraction code` 2016-12-21 16:42:56 -08:00
Benoit Steiner
3cfa16f41d Merged in benoitsteiner/opencl (pull request PR-279)
Fix for auto appearing in functor template argument.
2016-12-21 15:08:54 -08:00
Benoit Steiner
519d63d350 Added support for libxsmm kernel in multithreaded contractions 2016-12-21 15:06:06 -08:00
Benoit Steiner
0657228569 Simplified the way we link libxsmm 2016-12-21 14:40:08 -08:00
Benoit Steiner
bbca405f04 Pulled latest updates from trunk 2016-12-21 13:45:28 -08:00
Benoit Steiner
b91be60220 Automatically include and link libxsmm when present. 2016-12-21 13:44:59 -08:00
Gael Guennebaud
c6882a72ed Merged in joaoruileal/eigen (pull request PR-276)
Minor improvements to Umfpack support
2016-12-21 21:39:48 +01:00
Benoit Steiner
f9eff17e91 Leverage libxsmm kernels within signle threaded contractions 2016-12-21 12:32:06 -08:00
Benoit Steiner
c19fe5e9ed Added support for libxsmm in the eigen makefiles 2016-12-21 10:43:40 -08:00
Benoit Steiner
a34d4ebd74 Merged in benoitsteiner/opencl (pull request PR-278) 2016-12-21 08:24:17 -08:00
Luke Iwanski
c55ecfd820 Fix for auto appearing in functor template argument. 2016-12-21 15:42:51 +00:00
Joao Rui Leal
c8c89b5e19 renamed methods umfpackReportControl(), umfpackReportInfo(), and umfpackReportStatus() from UmfPackLU to printUmfpackControl(), printUmfpackInfo(), and printUmfpackStatus() 2016-12-21 09:16:28 +00:00
Benoit Steiner
0f577d4744 Merged eigen/eigen into default 2016-12-20 17:02:06 -08:00
Gael Guennebaud
f2f9df8aa5 Remove MSVC warning 4127 - conditional expression is constant from the disabled list as we now have a local workaround. 2016-12-20 22:53:19 +01:00
Gael Guennebaud
2b3fc981b8 bug #1362: workaround constant conditional warning produced by MSVC 2016-12-20 22:52:27 +01:00
Luke Iwanski
29186f766f Fixed order of initialisation in ExecExprFunctorKernel functor. 2016-12-20 21:32:42 +00:00
Gael Guennebaud
94e8d8902f Fix bug #1367: compilation fix for gcc 4.1! 2016-12-20 22:17:01 +01:00
Gael Guennebaud
e8d6862f14 Properly adjust precision when saving to Market format. 2016-12-20 22:10:33 +01:00
Gael Guennebaud
e2f4ee1c2b Speed up parsing of sparse Market file. 2016-12-20 21:56:21 +01:00
Luke Iwanski
8245851d1b Matching parameters order between lambda and the functor. 2016-12-20 16:18:15 +00:00
Gael Guennebaud
684cfc762d Add transpose, adjoint, conjugate methods to SelfAdjointView (useful to write generic code) 2016-12-20 16:33:53 +01:00
Gael Guennebaud
8bd0d3aa34 merge 2016-12-20 15:56:00 +01:00
Gael Guennebaud
11f55b2979 Optimize storage layout of Cwise* and PlainObjectBase evaluator to remove the functor or outer-stride if they are empty.
For instance, sizeof("(A-B).cwiseAbs2()") with A,B Vector4f is now 16 bytes, instead of 48 before this optimization.
In theory, evaluators should be completely optimized away by the compiler, but this might help in some cases.
2016-12-20 15:55:40 +01:00
Gael Guennebaud
5271474b15 Remove common "noncopyable" base class from evaluator_base to get a chance to get EBO (Empty Base Optimization)
Note: we should probbaly get rid of this class and define a macro instead.
2016-12-20 15:51:30 +01:00
Christoph Hertzberg
1c024e5585 Added some possible temporaries to .hgignore 2016-12-20 14:45:44 +01:00
Gael Guennebaud
316673bbde Clean-up usage of ExpressionTraits in all/any implementation. 2016-12-20 14:38:05 +01:00
Benoit Steiner
548ed30a1c Added an OpenCL regression test 2016-12-19 18:56:26 -08:00
Christoph Hertzberg
10c6bcdc2e Add support for long indexes and for (real-valued) row-major matrices to CholmodSupport module 2016-12-19 14:07:42 +01:00
Gael Guennebaud
f5d644b415 Make sure that HyperPlane::transform manitains a unit normal vector in the Affine case. 2016-12-20 09:35:00 +01:00
Benoit Steiner
27ceb43bf6 Fixed race condition in the tensor_shuffling_sycl test 2016-12-19 15:34:42 -08:00
Benoit Steiner
923acadfac Fixed compilation errors with gcc6 when compiling the AVX512 intrinsics 2016-12-19 13:02:27 -08:00
Benoit Jacob
751e097c57 Use 32 registers on ARM64 2016-12-19 13:44:46 -05:00
Benoit Steiner
fb1d0138ec Include SSE packet instructions when compiling with avx512 enabled. 2016-12-19 07:32:48 -08:00
Joao Rui Leal
95b804c0fe it is now possible to change Umfpack control settings before factorizations; added access to the report functions of Umfpack 2016-12-19 10:45:59 +00:00
Gael Guennebaud
8c0e701504 bug #1360: fix sign issue with pmull on altivec 2016-12-18 22:13:19 +00:00
Gael Guennebaud
fc94258e77 Fix unused warning 2016-12-18 22:11:48 +00:00
Benoit Steiner
0e0d92d34b Merged in benoitsteiner/opencl (pull request PR-275)
Improved support for OpenCL
2016-12-17 10:14:17 -08:00
Benoit Steiner
9e03dfb452 Made sure EIGEN_HAS_C99_MATH is defined when compiling OpenCL code 2016-12-17 09:23:37 -08:00
Benoit Steiner
70d0172f0c Merged eigen/eigen into default 2016-12-16 17:37:04 -08:00
Benoit Steiner
8910442e19 Fixed memcpy, memcpyHostToDevice and memcpyDeviceToHost for Sycl. 2016-12-16 15:45:04 -08:00
Luke Iwanski
54db66c5df struct -> class in order to silence compilation warning. 2016-12-16 20:25:20 +00:00
Mehdi Goli
35bae513a0 Converting all parallel for lambda to functor in order to prevent kernel duplication name error; adding tensorConcatinationOp backend for sycl. 2016-12-16 19:46:45 +00:00
ermak
d60cca32e5 Transformation methods added to ParametrizedLine class. 2016-12-17 00:45:13 +07:00
Jeff Trull
7949849ebc refactor common row/column iteration code into its own class 2016-12-08 19:40:15 -08:00
Jeff Trull
d7bc64328b add display of entries to gdb sparse matrix prettyprinter 2016-12-08 18:50:17 -08:00
Jeff Trull
ff424927bc Introduce a simple pretty printer for sparse matrices (no contents) 2016-12-08 09:45:27 -08:00
Jeff Trull
5ce5418631 Correct prettyprinter comment - Quaternions are in fact supported 2016-12-08 07:31:16 -08:00
Rafael Guglielmetti
8f11df2667 NumTraits.h:
For the values 'ReadCost, AddCost and MulCost', information about value Eigen::HugeCost
2016-12-16 09:07:12 +00:00
Gael Guennebaud
7d5303a083 Partly revert changeset 642dddcce2
, just in case the x87 issue popup again
2016-12-16 09:25:14 +01:00
Benoit Steiner
2f7c2459b7 Merged in benoitsteiner/opencl (pull request PR-272)
Adding asynchandler to sycl queue as lack of it can cause undefined behaviour.
2016-12-15 17:46:40 -08:00
Mehdi Goli
c5e8546306 Adding asynchandler to sycl queue as lack of it can cause undefined behaviour. 2016-12-15 16:59:57 +00:00
Christoph Hertzberg
4247d35d4b Fixed bug which (extremely rarely) could end in an infinite loop 2016-12-15 17:22:12 +01:00
Christoph Hertzberg
642dddcce2 Fix nonnull-compare warning 2016-12-15 17:16:56 +01:00
Benoit Steiner
1324ffef2f Reenabled the use of constexpr on OpenCL devices 2016-12-15 06:49:38 -08:00
Gael Guennebaud
5d00fdf0e8 bug #1363: fix mingw's ABI issue 2016-12-15 11:58:31 +01:00
Benoit Steiner
2c2e218471 Avoid using #define since they can conflict with user code 2016-12-14 19:49:15 -08:00
Benoit Steiner
3beb180ee5 Don't call EnvThread::OnCancel by default since it doesn't do anything. 2016-12-14 18:33:39 -08:00
Benoit Steiner
9ff5d0f821 Merged eigen/eigen into default 2016-12-14 17:32:16 -08:00
Mehdi Goli
730eb9fe1c Adding asynchronous execution as it improves the performance. 2016-12-14 17:38:53 +00:00
Gael Guennebaud
11b492e993 bug #1358: fix compilation for sparse += sparse.selfadjointView(); 2016-12-14 17:53:47 +01:00
Gael Guennebaud
e67397bfa7 bug #1359: fix compilation of col_major_sparse.row() *= scalar
(used to work in 3.2.9 though the expression is not really writable)
2016-12-14 17:05:26 +01:00
Gael Guennebaud
98d7458275 bug #1359: fix sparse /=scalar and *=scalar implementation.
InnerIterators must be obtained from an evaluator.
2016-12-14 17:03:13 +01:00
Mehdi Goli
2d4a091beb Adding tensor contraction operation backend for Sycl; adding test for contractionOp sycl backend; adding temporary solution to prevent memory leak in buffer; cleaning up cxx11_tensor_buildins_sycl.h 2016-12-14 15:30:37 +00:00
Gael Guennebaud
c817ce3ba3 bug #1361: fix compilation issue in mat=perm.inverse() 2016-12-13 23:10:27 +01:00
Benoit Steiner
a432fc102d Moved the choice of ThreadPool to unsupported/Eigen/CXX11/ThreadPool 2016-12-12 15:24:16 -08:00
Benoit Steiner
8ae68924ed Made ThreadPoolInterface::Cancel() an optional functionality 2016-12-12 11:58:38 -08:00
Gael Guennebaud
57acb05eef Update and extend doc on alignment issues. 2016-12-11 22:45:32 +01:00
Benoit Steiner
76fca22134 Use a more accurate timer to sleep on Linux systems. 2016-12-09 15:12:24 -08:00
Benoit Steiner
4deafd35b7 Introduce a portable EIGEN_SLEEP macro. 2016-12-09 14:52:15 -08:00
Benoit Steiner
aafa97f4d2 Fixed build error with MSVC 2016-12-09 14:42:32 -08:00
Benoit Steiner
2f5b7a199b Reworked the threadpool cancellation mechanism to not depend on pthread_cancel since it turns out that pthread_cancel doesn't work properly on numerous platforms. 2016-12-09 13:05:14 -08:00
Benoit Steiner
3d59a47720 Added a message to ease the detection of platforms on which thread cancellation isn't supported. 2016-12-08 14:51:46 -08:00
Benoit Steiner
28ee8f42b2 Added a Flush method to the RunQueue 2016-12-08 14:07:56 -08:00
Benoit Steiner
69ef267a77 Added the new threadpool cancel method to the threadpool interface based class. 2016-12-08 14:03:25 -08:00
Benoit Steiner
7bfff85355 Added support for thread cancellation on Linux 2016-12-08 08:12:49 -08:00
Benoit Steiner
6811e6cf49 Merged in srvasude/eigen/fix_cuda_exp (pull request PR-268)
Fix expm1 CUDA implementation (do not shadow exp CUDA implementation).
2016-12-08 05:14:11 -08:00
Gael Guennebaud
747202d338 typo 2016-12-08 12:48:15 +01:00
Gael Guennebaud
bb297abb9e make sure we use the right eigen version 2016-12-08 12:00:11 +01:00
Gael Guennebaud
8b4b00d277 fix usage of custom compiler 2016-12-08 11:59:39 +01:00
Gael Guennebaud
7105596899 Add missing include and use -O3 2016-12-07 16:56:08 +01:00
Gael Guennebaud
780f3c1adf Fix call to convert on linux 2016-12-07 16:30:11 +01:00
Gael Guennebaud
3855ab472f Cleanup file structure 2016-12-07 14:23:49 +01:00
Gael Guennebaud
59a59fa8e7 Update perf monitoring scripts to generate html/svg outputs 2016-12-07 13:36:56 +01:00
Angelos Mantzaflaris
7694684992 Remove superfluous const's (can cause warnings on some Intel compilers)
(grafted from e236d3443c
)
2016-12-07 00:37:48 +01:00
Gael Guennebaud
f2c506b03d Add a script example to run and upload performance tests 2016-12-06 16:46:52 +01:00
Gael Guennebaud
1b4e085a7f generate png file for web upload 2016-12-06 16:46:22 +01:00
Gael Guennebaud
f725f1cebc Mention the CMAKE_PREFIX_PATH variable. 2016-12-06 15:23:45 +01:00
Gael Guennebaud
f90c4aebc5 Update monitored changeset lists 2016-12-06 15:07:46 +01:00
Gael Guennebaud
eb621413c1 Revert vec/y to vec*(1/y) in row-major TRSM:
- div is extremely costly
- this is consistent with the column-major case
- this is consistent with all other BLAS implementations
2016-12-06 15:04:50 +01:00
Gael Guennebaud
8365c2c941 Fix BLAS backend for symmetric rank K updates. 2016-12-06 14:47:09 +01:00
Gael Guennebaud
0c4d05b009 Explain how to choose your favorite Eigen version 2016-12-06 11:34:06 +01:00
Silvio Traversaro
e049a2a72a Added relocatable cmake support also for CMake before 3.0 and after 2.8.8 2016-12-06 10:37:34 +01:00
Srinivas Vasudevan
e6c8b5500c Change comparisons to use Scalar instead of RealScalar. 2016-12-05 14:01:45 -08:00
Srinivas Vasudevan
f7d7c33a28 Fix expm1 CUDA implementation (do not shadow exp CUDA implementation). 2016-12-05 12:19:01 -08:00
Silvio Traversaro
18481b518f Make CMake config file relocatable 2016-12-05 10:39:52 +01:00
Gael Guennebaud
c68c8631e7 fix compilation of BTL's blaze interface 2016-12-05 23:02:16 +01:00
Gael Guennebaud
1ff1d4a124 Add performance monitoring for LLT 2016-12-05 23:01:52 +01:00
Srinivas Vasudevan
09ee7f0c80 Fix small nit where I changed name of plog1p to pexpm1. 2016-12-02 15:30:12 -08:00
Srinivas Vasudevan
a0d3ac760f Sync from Head. 2016-12-02 14:14:45 -08:00
Srinivas Vasudevan
218764ee1f Added support for expm1 in Eigen. 2016-12-02 14:13:01 -08:00
Gael Guennebaud
66f65ccc36 Ease compiler job to generate clean and efficient code in mat*vec. 2016-12-02 22:41:26 +01:00
Gael Guennebaud
fe696022ec Operators += and -= do not resize! 2016-12-02 22:40:25 +01:00
Angelos Mantzaflaris
18de92329e use numext::abs
(grafted from 0a08d4c60b
)
2016-12-02 11:48:06 +01:00
Angelos Mantzaflaris
e8a6aa518e 1. Add explicit template to abs2 (resolves deduction for some arithmetic types)
2. Avoid signed-unsigned conversion in comparison (warning in case Scalar is unsigned)
(grafted from 4086187e49
)
2016-12-02 11:39:18 +01:00
Gael Guennebaud
a6b971e291 Fix memory leak in Ref<Sparse> 2016-12-05 16:59:30 +01:00
Gael Guennebaud
8640ffac65 Optimize SparseLU::solve for rhs vectors 2016-12-05 15:41:14 +01:00
Gael Guennebaud
62acd67903 remove temporary in SparseLU::solve 2016-12-05 15:11:57 +01:00
Gael Guennebaud
0db6d5b3f4 bug #1356: fix calls to evaluator::coeffRef(0,0) to get the address of the destination
by adding a dstDataPtr() member to the kernel. This fixes undefined behavior if dst is empty (nullptr).
2016-12-05 15:08:09 +01:00
Gael Guennebaud
91003f3b86 typo 2016-12-05 13:51:07 +01:00
Gael Guennebaud
445c015751 extend monitoring benchmarks with transpose matrix-vector and triangular matrix-vectors. 2016-12-05 13:36:26 +01:00
Gael Guennebaud
e3f613cbd4 Improve performance of row-major-dense-matrix * vector products for recent CPUs.
This revised version does not bother about aligned loads/stores,
and rather processes 8 rows at ones for better instruction pipelining.
2016-12-05 13:02:01 +01:00
Gael Guennebaud
3abc827354 Clean debugging code 2016-12-05 12:59:32 +01:00
Benoit Steiner
462c28e77a Merged in srvasude/eigen (pull request PR-265)
Add Expm1 support to Eigen.
2016-12-05 02:31:11 +00:00
Gael Guennebaud
4465d20403 Add missing generic load methods. 2016-12-03 21:25:04 +01:00
Gael Guennebaud
6a5fe86098 Complete rewrite of column-major-matrix * vector product to deliver higher performance of modern CPU.
The previous code has been optimized for Intel core2 for which unaligned loads/stores were prohibitively expensive.
This new version exhibits much higher instruction independence (better pipelining) and explicitly leverage FMA.
According to my benchmark, on Haswell this new kernel is always faster than the previous one, and sometimes even twice as fast.
Even higher performance could be achieved with a better blocking size heuristic and, perhaps, with explicit prefetching.
We should also check triangular product/solve to optimally exploit this new kernel (working on vertical panel of 4 columns is probably not optimal anymore).
2016-12-03 21:14:14 +01:00
Benoit Steiner
2bfece5cd1 Merged eigen/eigen into default 2016-12-02 16:30:14 -08:00
Mehdi Goli
592acc5bfa Makingt default numeric_list works with sycl. 2016-12-02 17:58:30 +00:00
Gael Guennebaud
8dfb3e00b8 merge 2016-12-02 11:34:21 +01:00
Gael Guennebaud
4c0d5f3c01 Add perf monitoring for gemv 2016-12-02 11:34:12 +01:00
Gael Guennebaud
d2718d662c Re-enable A^T*A action in BTL 2016-12-02 11:32:03 +01:00
Christoph Hertzberg
22f7d398e2 bug #1355: Fixed wrong line-endings on two files 2016-12-02 11:22:05 +01:00
Gael Guennebaud
27873008d4 Clean up SparseCore module regarding ReverseInnerIterator 2016-12-01 21:55:10 +01:00
Angelos Mantzaflaris
8c24723a09 typo UIntPtr
(grafted from b6f04a2dd4
)
2016-12-01 21:25:58 +01:00
Angelos Mantzaflaris
aeba0d8655 fix two warnings(unused typedef, unused variable) and a typo
(grafted from a9aa3bcf50
)
2016-12-01 21:23:43 +01:00
Gael Guennebaud
181138a1cb fix member order 2016-12-01 17:06:20 +01:00
Gael Guennebaud
9f297d57ae Merged in rmlarsen/eigen (pull request PR-256)
Add a default constructor for the "fake" __half class when not using the __half class provided by CUDA.
2016-12-01 15:27:33 +00:00
Gael Guennebaud
f95e3b84a5 merge 2016-12-01 16:18:57 +01:00
Benoit Steiner
7ff26ddcbb Merged eigen/eigen into default 2016-12-01 07:13:17 -08:00
Gael Guennebaud
037b46762d Fix misleading-indentation warnings. 2016-12-01 16:05:42 +01:00
Mehdi Goli
79aa2b784e Adding sycl backend for TensorPadding.h; disbaling __unit128 for sycl in TensorIntDiv.h; disabling cashsize for sycl in tensorDeviceDefault.h; adding sycl backend for StrideSliceOP ; removing sycl compiler warning for creating an array of size 0 in CXX11Meta.h; cleaning up the sycl backend code. 2016-12-01 13:02:27 +00:00
Benoit Steiner
a70393fd02 Cleaned up forward declarations 2016-11-30 21:59:07 -08:00
Benoit Steiner
e073de96dc Moved the MemCopyFunctor back to TensorSyclDevice since it's the only caller and it makes TensorFlow compile again 2016-11-30 21:36:52 -08:00
Benoit Steiner
fca27350eb Added the deallocate_all() method back 2016-11-30 20:45:20 -08:00
Benoit Steiner
e633a8371f Simplified includes 2016-11-30 20:21:18 -08:00
Benoit Steiner
7cd33df4ce Improved formatting 2016-11-30 20:20:44 -08:00
Benoit Steiner
fd1dc3363e Merged eigen/eigen into default 2016-11-30 20:16:17 -08:00
Benoit Steiner
f5107010ee Udated the Sizes class to work on AMD gpus without requiring a separate implementation 2016-11-30 19:57:28 -08:00
Benoit Steiner
e37c2c52d3 Added an implementation of numeric_list that works with sycl 2016-11-30 19:55:15 -08:00
Gael Guennebaud
8df272af88 Fix slection of product implementation for dynamic size matrices with fixed max size. 2016-11-30 22:21:33 +01:00
Benoit Steiner
faa2ff99c6 Pulled latest update from trunk 2016-11-30 09:31:24 -08:00
Benoit Steiner
df3da0780d Updated customIndices2Array to handle various index sizes. 2016-11-30 09:30:12 -08:00
Gael Guennebaud
c927af60ed Fix a performance regression in (mat*mat)*vec for which mat*mat was evaluated multiple times. 2016-11-30 17:59:13 +01:00
Luke Iwanski
26fff1c5b1 Added EIGEN_STRONG_INLINE to get_sycl_supported_device(). 2016-11-30 16:55:22 +00:00
Gael Guennebaud
ab4ef5e66e bug #1351: fix compilation of random with old compilers 2016-11-30 17:37:53 +01:00
Sergiu Deitsch
5e3c5c42f6 cmake: remove architecture dependency from Eigen3ConfigVersion.cmake
Also, install Eigen3*.cmake under $prefix/share/eigen3/cmake by default.
(grafted from 86ab00cdcf
)
2016-11-30 15:46:46 +01:00
Sergiu Deitsch
3440b46e2f doc: mention the NO_MODULE option and target availability
(grafted from 65f09be8d2
)
2016-11-30 15:41:38 +01:00
Rasmus Munk Larsen
a0329f64fb Add a default constructor for the "fake" __half class when not using the
__half class provided by CUDA.
2016-11-29 13:18:09 -08:00
Mehdi Goli
577ce78085 Adding TensorShuffling backend for sycl; adding TensorReshaping backend for sycl; cleaning up the sycl backend. 2016-11-29 15:30:42 +00:00
Benoit Steiner
3011dc94ef Call internal::array_prod to compute the total size of the tensor. 2016-11-28 09:00:31 -08:00
Benoit Steiner
02080e2b67 Merged eigen/eigen into default 2016-11-27 07:27:30 -08:00
Benoit Steiner
9fd081cddc Fixed compilation warnings 2016-11-26 20:22:25 -08:00
Benoit Steiner
9f8fbd9434 Merged eigen/eigen into default 2016-11-26 11:28:25 -08:00
Benoit Steiner
67b2c41f30 Avoided unnecessary type conversion 2016-11-26 11:27:29 -08:00
Benoit Steiner
7fe704596a Added missing array_get method for numeric_list 2016-11-26 11:26:07 -08:00
Mehdi Goli
7318daf887 Fixing LLVM error on TensorMorphingSycl.h on GPU; fixing int64_t crash for tensor_broadcast_sycl on GPU; adding get_sycl_supported_devices() on syclDevice.h. 2016-11-25 16:19:07 +00:00
Benoit Steiner
7ad37606dd Fixed the documentation of Scalar Tensors 2016-11-24 12:31:43 -08:00
Benoit Steiner
3be1afca11 Disabled the "remove the call to 'std::abs' since unsigned values cannot be negative" warning introduced in clang 3.5 2016-11-23 18:49:51 -08:00
Gael Guennebaud
308961c05e Fix compilation. 2016-11-23 22:17:52 +01:00
Gael Guennebaud
21d0286d81 bug #1348: Document EIGEN_MAX_ALIGN_BYTES and EIGEN_MAX_STATIC_ALIGN_BYTES,
and reflect in the doc that EIGEN_DONT_ALIGN* are deprecated.
2016-11-23 22:15:03 +01:00
Mehdi Goli
b8cc5635d5 Removing unsupported device from test case; cleaning the tensor device sycl. 2016-11-23 16:30:41 +00:00
Gael Guennebaud
7f6333c32b Merged in tal500/eigen-eulerangles (pull request PR-237)
Euler angles
2016-11-23 15:17:38 +00:00
Gael Guennebaud
f12b368417 Extend polynomial solver unit tests to complexes 2016-11-23 16:05:45 +01:00
Gael Guennebaud
56e5ec07c6 Automatically switch between EigenSolver and ComplexEigenSolver, and fix a few Real versus Scalar issues. 2016-11-23 16:05:10 +01:00
Gael Guennebaud
9246587122 Patch from Oleg Shirokobrod to extend polynomial solver to complexes 2016-11-23 15:42:26 +01:00
Gael Guennebaud
e340866c81 Fix compilation with gcc and old ABI version 2016-11-23 14:04:57 +01:00
Gael Guennebaud
a91de27e98 Fix compilation issue with MSVC:
MSVC always messes up with shadowed template arguments, for instance in:
  struct B { typedef float T; }
  template<typename T> struct A : B {
    T g;
  };
The type of A<double>::g will be float and not double.
2016-11-23 12:24:48 +01:00
Gael Guennebaud
74637fa4e3 Optimize predux<Packet8f> (AVX) 2016-11-22 21:57:52 +01:00
Gael Guennebaud
178c084856 Disable usage of SSE3 _mm_hadd_ps that is extremely slow. 2016-11-22 21:53:14 +01:00
Gael Guennebaud
7dd894e40e Optimize predux<Packet4d> (AVX) 2016-11-22 21:41:30 +01:00
Gael Guennebaud
f3fb0a1940 Disable usage of SSE3 haddpd that is extremely slow. 2016-11-22 16:58:31 +01:00
Sergiu Deitsch
5c516e4e0a cmake: added Eigen3::Eigen imported target
(grafted from a287140f72
)
2016-11-22 12:25:06 +01:00
Gael Guennebaud
6a84246a6a Fix regression in assigment of sparse block to spasre block. 2016-11-21 21:46:42 +01:00
Benoit Steiner
f11da1d83b Made the QueueInterface thread safe 2016-11-20 13:17:08 -08:00
Benoit Steiner
ed839c5851 Enable the use of constant expressions with clang >= 3.6 2016-11-20 10:34:49 -08:00
Benoit Steiner
6d781e3e52 Merged eigen/eigen into default 2016-11-20 10:12:54 -08:00
Benoit Steiner
79a07b891b Fixed a typo 2016-11-20 07:07:41 -08:00
Gael Guennebaud
465ede0f20 Fix compilation issue in mat = permutation (regression introduced in 8193ffb3d3
)
2016-11-20 09:41:37 +01:00
Benoit Steiner
81151bd474 Fixed merge conflicts 2016-11-19 19:12:59 -08:00
Benoit Steiner
9265ca707e Made it possible to check the state of a sycl device without synchronization 2016-11-19 10:56:24 -08:00
Benoit Steiner
2d1aec15a7 Added missing include 2016-11-19 08:09:54 -08:00
Luke Iwanski
af67335e0e Added test for cwiseMin, cwiseMax and operator%. 2016-11-19 13:37:27 +00:00
Benoit Steiner
1bdf1b9ce0 Merged in benoitsteiner/opencl (pull request PR-253)
OpenCL improvements
2016-11-19 04:44:43 +00:00
Benoit Steiner
a357fe1fb9 Code cleanup 2016-11-18 16:58:09 -08:00
Benoit Steiner
1c6eafb46b Updated cxx11_tensor_device_sycl to run only on the OpenCL devices available on the host 2016-11-18 16:43:27 -08:00
Benoit Steiner
ca754caa23 Only runs the cxx11_tensor_reduction_sycl on devices that are available. 2016-11-18 16:31:14 -08:00
Benoit Steiner
dc601d79d1 Added the ability to run test exclusively OpenCL devices that are listed by sycl::device::get_devices(). 2016-11-18 16:26:50 -08:00
Benoit Steiner
8649e16c2a Enable EIGEN_HAS_C99_MATH when building with the latest version of Visual Studio 2016-11-18 14:18:34 -08:00
Benoit Steiner
110b7f8d9f Deleted unnecessary semicolons 2016-11-18 14:06:17 -08:00
Benoit Steiner
b5e3285e16 Test broadcasting on OpenCL devices with 64 bit indexing 2016-11-18 13:44:20 -08:00
Gael Guennebaud
164414c563 Merged in ChunW/eigen (pull request PR-252)
Workaround for error in VS2012 with /clr
2016-11-18 21:07:29 +00:00
Benoit Steiner
37c2c516a6 Cleaned up the sycl device code 2016-11-18 12:38:06 -08:00
Benoit Steiner
7335c49204 Fixed the cxx11_tensor_device_sycl test 2016-11-18 12:37:13 -08:00
Mehdi Goli
15e226d7d3 adding Benoit changes on the TensorDeviceSycl.h 2016-11-18 16:34:54 +00:00
Mehdi Goli
622805a0c5 Modifying TensorDeviceSycl.h to always create buffer of type uint8_t and convert them to the actual type at the execution on the device; adding the queue interface class to separate the lifespan of sycl queue and buffers,created for that queue, from Eigen::SyclDevice; modifying sycl tests to support the evaluation of the results for both row major and column major data layout on all different devices that are supported by Sycl{CPU; GPU; and Host}. 2016-11-18 16:20:42 +00:00
Luke Iwanski
5159675c33 Added isnan, isfinite and isinf for SYCL device. Plus test for that. 2016-11-18 16:01:48 +00:00
Tal Hadad
76b2a3e6e7 Allow to construct EulerAngles from 3D vector directly.
Using assignment template struct to distinguish between 3D vector and 3D rotation matrix.
2016-11-18 15:01:06 +02:00
Luke Iwanski
927bd62d2a Now testing out (+=, =) in.FUNC() and out (+=, =) out.FUNC() 2016-11-18 11:16:42 +00:00
Gael Guennebaud
8193ffb3d3 bug #1343: fix compilation regression in mat+=selfadjoint_view.
Generic EigenBase2EigenBase assignment was incomplete.
2016-11-18 10:17:34 +01:00
Gael Guennebaud
cebff7e3a2 bug #1343: fix compilation regression in array = matrix_product 2016-11-18 10:09:33 +01:00
Benoit Steiner
7c30078b9f Merged eigen/eigen into default 2016-11-17 22:53:37 -08:00
Benoit Steiner
553f50b246 Added a way to detect errors generated by the opencl device from the host 2016-11-17 21:51:48 -08:00
Benoit Steiner
72a45d32e9 Cleanup 2016-11-17 21:29:15 -08:00
Benoit Steiner
4349fc640e Created a test to check that the sycl runtime can successfully report errors (like ivision by 0).
Small cleanup
2016-11-17 20:27:54 -08:00
Benoit Steiner
a6a3fd0703 Made TensorDeviceCuda.h compile on windows 2016-11-17 16:15:27 -08:00
Chun Wang
0d0948c3b9 Workaround for error in VS2012 with /clr 2016-11-17 17:54:27 -05:00
Benoit Steiner
004344cf54 Avoid calling log(0) or 1/0 2016-11-17 11:56:44 -08:00
Konstantinos Margaritis
a1d5c503fa replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f 2016-11-17 13:27:45 -05:00
Konstantinos Margaritis
672aa97d4d implement float/std::complex<float> for ZVector as well, minor fixes to ZVector 2016-11-17 13:27:33 -05:00
Konstantinos Margaritis
8290e21fb5 replace sizeof(Packet) with PacketSize else it breaks for ZVector.Packet4f 2016-11-17 13:21:15 -05:00
Luke Iwanski
7878756dea Fixed existing test. 2016-11-17 17:46:55 +00:00
Luke Iwanski
c5130dedbe Specialised basic math functions for SYCL device. 2016-11-17 11:47:13 +00:00
Benoit Steiner
f2e8b73256 Enable the use of AVX512 instruction by default 2016-11-16 21:28:04 -08:00
Gael Guennebaud
7b09e4dd8c bump default branch to 3.3.90 2016-11-16 22:20:58 +01:00
Benoit Steiner
dff9a049c4 Optimized the computation of exp, sqrt, ceil anf floor for fp16 on Pascal GPUs 2016-11-16 09:01:51 -08:00
Benoit Steiner
b5c75351e3 Merged eigen/eigen into default 2016-11-14 15:54:44 -08:00
Rasmus Munk Larsen
32df1b1046 Reduce dispatch overhead in parallelFor by only calling thread_pool.Schedule() for one of the two recursive calls in handleRange. This avoids going through the scedule path to push both recursive calls onto another thread-queue in the binary tree, but instead executes one of them on the main thread. At the leaf level this will still activate a full complement of threads, but will save up to 50% of the overhead in Schedule (random number generation, insertion in queue which includes signaling via atomics). 2016-11-14 14:18:16 -08:00
Mehdi Goli
05e8c2a1d9 Adding extra test for non-fixed size to broadcast; Replacing stcl with sycl. 2016-11-14 18:13:53 +00:00
Mehdi Goli
f8ca893976 Adding TensorFixsize; adding sycl device memcpy; adding insial stage of slicing. 2016-11-14 17:51:57 +00:00
Gael Guennebaud
0ee92aa38e Optimize sparse<bool> && sparse<bool> to use the same path as for coeff-wise products. 2016-11-14 18:47:41 +01:00
Gael Guennebaud
2e334f5da0 bug #426: move operator && and || to MatrixBase and SparseMatrixBase. 2016-11-14 18:47:02 +01:00
Gael Guennebaud
a048aba14c Merged in olesalscheider/eigen (pull request PR-248)
Make sure not to call numext::maxi on expression templates
2016-11-14 13:25:53 +00:00
Gael Guennebaud
eedb87f4ba Fix regression in SparseMatrix::ReverseInnerIterator 2016-11-14 14:05:53 +01:00
Niels Ole Salscheider
51fef87408 Make sure not to call numext::maxi on expression templates 2016-11-12 12:20:57 +01:00
Mehdi Goli
a5c3f15682 Adding comment to TensorDeviceSycl.h and cleaning the code. 2016-11-11 19:06:34 +00:00
Benoit Steiner
f4722aa479 Merged in benoitsteiner/opencl (pull request PR-247) 2016-11-11 00:01:28 +00:00
Mehdi Goli
3be3963021 Adding EIGEN_STRONG_INLINE back; using size() instead of dimensions.TotalSize() on Tensor. 2016-11-10 19:16:31 +00:00
Mehdi Goli
12387abad5 adding the missing in eigen_assert! 2016-11-10 18:58:08 +00:00
Mehdi Goli
2e704d4257 Adding Memset; optimising MecopyDeviceToHost by removing double copying; 2016-11-10 18:45:12 +00:00
Benoit Steiner
75c080b176 Added a test to validate memory transfers between host and sycl device 2016-11-09 06:23:42 -08:00
Tal Hadad
15eca2432a Euler tests: Tighter precision when no roll exists and clean code. 2016-10-18 23:24:57 +03:00
Tal Hadad
6f4f12d1ed Add isApprox() and cast() functions.
test cases included
2016-10-17 22:23:47 +03:00
Tal Hadad
7402cfd4cc Add safty for near pole cases and test them better. 2016-10-17 20:42:08 +03:00
Tal Hadad
58f5d7d058 Fix calc bug, docs and better testing.
Test code changes:
* better coded
* rand and manual numbers
* singularity checking
2016-10-16 14:39:26 +03:00
Tal Hadad
078a202621 Merge Hongkai Dai correct range calculation, and remove ranges from API.
Docs updated.
2016-10-14 16:03:28 +03:00
Hongkai Dai
014d9f1d9b implement euler angles with the right ranges 2016-10-13 14:45:51 -07:00
Heiko Bauke
e19b58e672 alias template for matrix and array classes 2016-04-23 00:08:51 +02:00
yoco
15f273b63c fix reshape flag and test case 2014-02-10 22:49:13 +08:00
yoco
b64a09acc1 fix reshape's Max[Row/Col]AtCompileTime 2014-02-04 05:54:50 +08:00
yoco
f8ad87f226 Reshape always non-directly-access 2014-02-04 05:19:56 +08:00
yoco
515bbf8bb2 Improve reshape test case
- simplify test code
- add reshape chain
2014-02-04 02:50:23 +08:00
yoco
009047db27 Fix Reshape traits flag calculate bug 2014-02-04 02:21:41 +08:00
yoco
2b89080903 Remove reshape InnerPanel, add test, fix bug 2014-01-20 01:43:28 +08:00
yoco
03723abda0 Remove useless reshape row/col ctor 2014-01-20 00:22:16 +08:00
yoco
342c8e5321 Fix Reshape DirectAccessBit bug 2014-01-20 00:15:19 +08:00
yoco
24e1c0f2a1 add reshape test for const and static-size matrix 2014-01-18 23:27:53 +08:00
yoco
150796337a Add unit-test for reshape
- add unittest for dynamic & fixed-size reshape
- fix fixed-size reshape bugs bugs found while testing
2014-01-18 16:10:46 +08:00
yoco
497a7b0ce1 remove c++11, make c++03 compatible 2014-01-18 13:20:14 +08:00
yoco
9c832fad60 add reshape() snippets 2014-01-18 04:53:46 +08:00
yoco
1e1d0c15b5 add example code for Reshape class 2014-01-18 04:43:29 +08:00
yoco
fe2ad0647a reshape now supported
- add member function to plugin
- add forward declaration
- add documentation
- add include
2014-01-18 04:21:20 +08:00
yoco
7bd58ad0b6 add Eigen/src/Core/Reshape.h 2014-01-18 04:16:44 +08:00
2015 changed files with 292231 additions and 195972 deletions

19
.clang-format Normal file
View File

@@ -0,0 +1,19 @@
---
BasedOnStyle: Google
ColumnLimit: 120
---
Language: Cpp
BasedOnStyle: Google
ColumnLimit: 120
StatementMacros:
- EIGEN_STATIC_ASSERT
- EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
- EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
SortIncludes: false
AttributeMacros:
- EIGEN_STRONG_INLINE
- EIGEN_ALWAYS_INLINE
- EIGEN_DEVICE_FUNC
- EIGEN_DONT_INLINE
- EIGEN_DEPRECATED
- EIGEN_UNUSED

37
.clang-tidy Normal file
View File

@@ -0,0 +1,37 @@
---
# Conservative clang-tidy configuration for Eigen.
#
# Focuses on bug-finding checks with low false-positive rates.
# Intentionally omits style-enforcement checks (modernize-*, google-*,
# cppcoreguidelines-*) since Eigen has its own conventions and is a
# heavily-templated math library where many "modern C++" idioms don't apply.
Checks: >
-*,
bugprone-*,
-bugprone-narrowing-conversions,
-bugprone-easily-swappable-parameters,
-bugprone-implicit-widening-of-multiplication-result,
-bugprone-exception-escape,
misc-redundant-expression,
misc-unused-using-decls,
misc-misleading-identifier,
performance-for-range-copy,
performance-implicit-conversion-in-loop,
performance-unnecessary-copy-initialization,
performance-unnecessary-value-param,
readability-container-size-empty,
readability-duplicate-include,
readability-misleading-indentation,
readability-redundant-control-flow,
readability-redundant-smartptr-get,
WarningsAsErrors: ''
HeaderFilterRegex: 'Eigen/.*|test/.*|blas/.*|lapack/.*|unsupported/Eigen/.*'
# Eigen uses its own assert macros.
CheckOptions:
- key: bugprone-assert-side-effect.AssertMacros
value: 'eigen_assert,eigen_internal_assert,EIGEN_STATIC_ASSERT,VERIFY,VERIFY_IS_APPROX,VERIFY_IS_EQUAL,VERIFY_IS_MUCH_SMALLER_THAN,VERIFY_IS_NOT_APPROX,VERIFY_IS_NOT_EQUAL,VERIFY_IS_UNITARY,VERIFY_RAISES_ASSERT'
...

4
.git-blame-ignore-revs Normal file
View File

@@ -0,0 +1,4 @@
# First major clang-format MR (https://gitlab.com/libeigen/eigen/-/merge_requests/1429).
f38e16c193d489c278c189bc06b448a94adb45fb
# Formatting of tests, examples, benchmarks, et cetera (https://gitlab.com/libeigen/eigen/-/merge_requests/1432).
46e9cdb7fea25d7f7aef4332b9c3ead3857e213d

3
.gitattributes vendored Normal file
View File

@@ -0,0 +1,3 @@
*.sh eol=lf
debug/msvc/*.dat eol=crlf
debug/msvc/*.natvis eol=crlf

View File

@@ -1,4 +1,3 @@
syntax: glob
qrc_*cxx
*.orig
*.pyc
@@ -13,7 +12,7 @@ core
core.*
*.bak
*~
build*
*.build*
*.moc.*
*.moc
ui_*
@@ -28,7 +27,16 @@ activity.png
*.rej
log
patch
*.patch
a
a.*
lapack/testing
lapack/reference
.*project
.settings
Makefile
!ci/build.gitlab-ci.yml
!scripts/buildtests.in
!Eigen/Core
!Eigen/src/Core
CLAUDE.md

52
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,52 @@
# This file is part of Eigen, a lightweight C++ template library
# for linear algebra.
#
# Copyright (C) 2023, The Eigen Authors
#
# This Source Code Form is subject to the terms of the Mozilla
# Public License v. 2.0. If a copy of the MPL was not distributed
# with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
default:
interruptible: true
# For MR pipelines, auto-cancel running jobs when new commits are pushed.
# For scheduled (nightly) pipelines, never auto-cancel so all jobs run to
# completion and all failures are visible for debugging.
workflow:
auto_cancel:
on_new_commit: interruptible
on_job_failure: none
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
auto_cancel:
on_new_commit: none
- when: always
stages:
- checkformat
- build
- test
- benchmark
- deploy
variables:
# CMake build directory.
EIGEN_CI_BUILDDIR: .build
# Specify the CMake build target.
EIGEN_CI_BUILD_TARGET: ""
# If a test regex is specified, that will be selected.
# Otherwise, we will try a label if specified.
EIGEN_CI_CTEST_REGEX: ""
EIGEN_CI_CTEST_LABEL: ""
EIGEN_CI_CTEST_ARGS: ""
include:
- "/ci/checkformat.gitlab-ci.yml"
- "/ci/common.gitlab-ci.yml"
- "/ci/build.linux.gitlab-ci.yml"
- "/ci/build.windows.gitlab-ci.yml"
- "/ci/test.linux.gitlab-ci.yml"
- "/ci/test.windows.gitlab-ci.yml"
- "/ci/benchmark.gitlab-ci.yml"
- "/ci/deploy.gitlab-ci.yml"

View File

@@ -0,0 +1,59 @@
<!--
Thank you for submitting an issue!
Before opening a new issue, please search for keywords in the existing [list of issues](https://gitlab.com/libeigen/eigen/-/issues?state=opened) to verify it isn't a duplicate.
-->
### Summary
<!-- Summarize the bug encountered concisely. -->
### Environment
<!-- Please provide your development environment. -->
- **Operating System** : Windows/Linux
- **Architecture** : x64/Arm64/PowerPC ...
- **Eigen Version** : 5.0.0
- **Compiler Version** : gcc-12.0
- **Compile Flags** : -O3 -march=native
- **Vector Extension** : SSE/AVX/NEON ...
### Minimal Example
<!--
Please create a minimal reproducing example here that exhibits the problematic behavior.
The example should be complete, in that it can fully build and run. See the [the guidelines on stackoverflow](https://stackoverflow.com/help/minimal-reproducible-example) for how to create a good minimal example.
You can also link to [godbolt](https://godbolt.org). Note that you need to click
the "Share" button in the top right-hand corner of the godbolt page to get the share link
instead of the URL in your browser address bar.
-->
```cpp
// Insert your code here.
```
### Steps to reproduce the issue
<!-- Describe the necessary steps to reproduce the issue. -->
1. first step
2. second step
3. ...
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
### Relevant logs
<!-- Add relevant build logs or program output within blocks marked by " ``` " -->
### [Optional] Benchmark scripts and results
<!-- Please share any benchmark scripts - either standalone, or using [Google Benchmark](https://github.com/google/benchmark). -->
### Anything else that might help
<!--
It will be better to provide us more information to help narrow down the cause.
Including but not limited to the following:
- lines of code that might help us diagnose the problem.
- potential ways to address the issue.
- last known working/first broken version (release number or commit hash).
-->

View File

@@ -0,0 +1,14 @@
<!--
Thank you for submitting a Feature Request!
If you want to run ideas by the maintainers and the Eigen community first,
you can chat about them on the [Eigen Discord server](https://discord.gg/2SkEJGqZjR).
-->
### Describe the feature you would like to be implemented.
### Why Would such a feature be useful for other users?
### Any hints on how to implement the requested feature?
### Additional resources

View File

@@ -0,0 +1,30 @@
<!--
Thanks for contributing a merge request!
We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).
Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features. If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
This allows us to rebase and merge your change.
Note that we are a team of volunteers; we appreciate your patience during the review process.
-->
### Description
<!--Please explain your changes.-->
%{first_multiline_commit}
### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>.
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->
### Additional information
<!--Any additional information you think is important.-->

11
.hgeol
View File

@@ -1,11 +0,0 @@
[patterns]
*.sh = LF
*.MINPACK = CRLF
scripts/*.in = LF
debug/msvc/*.dat = CRLF
debug/msvc/*.natvis = CRLF
unsupported/test/mpreal/*.* = CRLF
** = native
[repository]
native = LF

1935
CHANGELOG.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

203
COPYING.APACHE Normal file
View File

@@ -0,0 +1,203 @@
/*
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

View File

@@ -23,4 +23,4 @@
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
*/

View File

@@ -1,674 +0,0 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.

View File

@@ -1,502 +0,0 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 2.1, February 1999
Copyright (C) 1991, 1999 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the Lesser GPL. It also counts
as the successor of the GNU Library Public License, version 2, hence
the version number 2.1.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the
Free Software Foundation and other authors who decide to use it. You
can use it too, but we suggest you first think carefully about whether
this license or the ordinary General Public License is the better
strategy to use in any particular case, based on the explanations below.
When we speak of free software, we are referring to freedom of use,
not price. Our General Public Licenses are designed to make sure that
you have the freedom to distribute copies of free software (and charge
for this service if you wish); that you receive source code or can get
it if you want it; that you can change the software and use pieces of
it in new free programs; and that you are informed that you can do
these things.
To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights. These restrictions translate to certain responsibilities for
you if you distribute copies of the library or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link other code with the library, you must provide
complete object files to the recipients, so that they can relink them
with the library after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.
To protect each distributor, we want to make it very clear that
there is no warranty for the free library. Also, if the library is
modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.
Finally, software patents pose a constant threat to the existence of
any free program. We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
restrictive license from a patent holder. Therefore, we insist that
any patent license obtained for a version of the library must be
consistent with the full freedom of use specified in this license.
Most GNU software, including some libraries, is covered by the
ordinary GNU General Public License. This license, the GNU Lesser
General Public License, applies to certain designated libraries, and
is quite different from the ordinary General Public License. We use
this license for certain libraries in order to permit linking those
libraries into non-free programs.
When a program is linked with a library, whether statically or using
a shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library. The ordinary
General Public License therefore permits such linking only if the
entire combination fits its criteria of freedom. The Lesser General
Public License permits more lax criteria for linking other code with
the library.
We call this license the "Lesser" General Public License because it
does Less to protect the user's freedom than the ordinary General
Public License. It also provides other free software developers Less
of an advantage over competing non-free programs. These disadvantages
are the reason we use the ordinary General Public License for many
libraries. However, the Lesser license provides advantages in certain
special circumstances.
For example, on rare occasions, there may be a special need to
encourage the widest possible use of a certain library, so that it becomes
a de-facto standard. To achieve this, non-free programs must be
allowed to use the library. A more frequent case is that a free
library does the same job as widely used non-free libraries. In this
case, there is little to gain by limiting the free library to free
software only, so we use the Lesser General Public License.
In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of
free software. For example, permission to use the GNU C Library in
non-free programs enables many more people to use the whole GNU
operating system, as well as its variant, the GNU/Linux operating
system.
Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is
linked with the Library has the freedom and the wherewithal to run
that program using a modified version of the Library.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.
GNU LESSER GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or
other authorized party saying it may be distributed under the terms of
this Lesser General Public License (also called "this License").
Each licensee is addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (1) uses at run time a
copy of the library already present on the user's computer system,
rather than copying library functions into the executable, and (2)
will operate properly with a modified version of the library, if
the user installs one, as long as the modified version is
interface-compatible with the version that the work was made with.
c) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
d) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
e) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the materials to be distributed need not include anything that is
normally distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Lesser General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).
To apply these terms, attach the following notices to the library. It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
<one line to give the library's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Also add information on how to contact you by electronic and paper mail.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!

View File

@@ -1,52 +1,51 @@
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.
Minpack Copyright Notice (1999) University of Chicago. All rights reserved
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the
following conditions are met:
1. Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
3. The end-user documentation included with the
redistribution, if any, must include the following
acknowledgment:
"This product includes software developed by the
University of Chicago, as Operator of Argonne National
Laboratory.
Alternately, this acknowledgment may appear in the software
itself, if and wherever such third-party acknowledgments
normally appear.
4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
BE CORRECTED.
5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
POSSIBILITY OF SUCH LOSS OR DAMAGES.

View File

@@ -357,7 +357,7 @@ Exhibit A - Source Code Form License Notice
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
file, You can obtain one at https://mozilla.org/MPL/2.0/.
If it is not possible or desirable to put the notice in a particular
file, then You may include the notice in a location (such as a LICENSE

View File

@@ -2,17 +2,10 @@ Eigen is primarily MPL2 licensed. See COPYING.MPL2 and these links:
http://www.mozilla.org/MPL/2.0/
http://www.mozilla.org/MPL/2.0/FAQ.html
Some files contain third-party code under BSD or LGPL licenses, whence the other
COPYING.* files here.
Some files contain third-party code under BSD, LGPL, Apache, or other
MPL2-compatible licenses, hence the other COPYING.* files here.
All the LGPL code is either LGPL 2.1-only, or LGPL 2.1-or-later.
For this reason, the COPYING.LGPL file contains the LGPL 2.1 text.
If you want to guarantee that the Eigen code that you are #including is licensed
under the MPL2 and possibly more permissive licenses (like BSD), #define this
preprocessor symbol:
EIGEN_MPL2_ONLY
For example, with most compilers, you could add this to your project CXXFLAGS:
-DEIGEN_MPL2_ONLY
This will cause a compilation error to be generated if you #include any code that is
LGPL licensed.
Note that some optional external dependencies (e.g. FFTW, MPFR C++)
are distributed under different licenses, including the GPL. Refer to
the individual source files and their respective COPYING files for
details.

View File

@@ -2,16 +2,16 @@
## Then modify the CMakeLists.txt file in the root directory of your
## project to incorporate the testing dashboard.
## # The following are required to uses Dart and the Cdash dashboard
## ENABLE_TESTING()
## INCLUDE(CTest)
## enable_testing()
## include(CTest)
set(CTEST_PROJECT_NAME "Eigen")
set(CTEST_NIGHTLY_START_TIME "00:00:00 UTC")
set(CTEST_DROP_METHOD "http")
set(CTEST_DROP_SITE "manao.inria.fr")
set(CTEST_DROP_LOCATION "/CDash/submit.php?project=Eigen")
set(CTEST_DROP_SITE "my.cdash.org")
set(CTEST_DROP_LOCATION "/submit.php?project=Eigen")
set(CTEST_DROP_SITE_CDASH TRUE)
set(CTEST_PROJECT_SUBPROJECTS
Official
Unsupported
)
#set(CTEST_PROJECT_SUBPROJECTS
#Official
#Unsupported
#)

View File

@@ -1,3 +1,4 @@
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_WARNINGS "2000")
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_ERRORS "2000")
list(APPEND CTEST_CUSTOM_ERROR_EXCEPTION @EIGEN_CTEST_ERROR_EXCEPTION@)

52
Eigen/AccelerateSupport Normal file
View File

@@ -0,0 +1,52 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_ACCELERATESUPPORT_MODULE_H
#define EIGEN_ACCELERATESUPPORT_MODULE_H
#include "SparseCore"
#include "src/Core/util/DisableStupidWarnings.h"
/** \ingroup Support_modules
* \defgroup AccelerateSupport_Module AccelerateSupport module
*
* This module provides an interface to the Apple Accelerate library.
* It provides the seven following main factorization classes:
* - class AccelerateLLT: a Cholesky (LL^T) factorization.
* - class AccelerateLDLT: the default LDL^T factorization.
* - class AccelerateLDLTUnpivoted: a Cholesky-like LDL^T factorization with only 1x1 pivots and no pivoting
* - class AccelerateLDLTSBK: an LDL^T factorization with Supernode Bunch-Kaufman and static pivoting
* - class AccelerateLDLTTPP: an LDL^T factorization with full threshold partial pivoting
* - class AccelerateQR: a QR factorization
* - class AccelerateCholeskyAtA: a QR factorization without storing Q (equivalent to A^TA = R^T R)
*
* \code
* #include <Eigen/AccelerateSupport>
* \endcode
*
* In order to use this module, the Accelerate headers must be accessible from
* the include paths, and your binary must be linked to the Accelerate framework.
* The Accelerate library is only available on Apple hardware.
*
* Note that many of the algorithms can be influenced by the UpLo template
* argument. All matrices are assumed to be symmetric. For example, the following
* creates an LDLT factorization where your matrix is symmetric (implicit) and
* uses the lower triangle:
*
* \code
* AccelerateLDLT<SparseMatrix<float>, Lower> ldlt;
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/AccelerateSupport/AccelerateSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_ACCELERATESUPPORT_MODULE_H

View File

@@ -1,19 +0,0 @@
include(RegexUtils)
test_escape_string_as_regex()
file(GLOB Eigen_directory_files "*")
escape_string_as_regex(ESCAPED_CMAKE_CURRENT_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
foreach(f ${Eigen_directory_files})
if(NOT f MATCHES "\\.txt" AND NOT f MATCHES "${ESCAPED_CMAKE_CURRENT_SOURCE_DIR}/[.].+" AND NOT f MATCHES "${ESCAPED_CMAKE_CURRENT_SOURCE_DIR}/src")
list(APPEND Eigen_directory_files_to_install ${f})
endif()
endforeach(f ${Eigen_directory_files})
install(FILES
${Eigen_directory_files_to_install}
DESTINATION ${INCLUDE_INSTALL_DIR}/Eigen COMPONENT Devel
)
install(DIRECTORY src DESTINATION ${INCLUDE_INSTALL_DIR}/Eigen COMPONENT Devel FILES_MATCHING PATTERN "*.h")

View File

@@ -9,33 +9,33 @@
#define EIGEN_CHOLESKY_MODULE_H
#include "Core"
#include "Jacobi"
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup Cholesky_Module Cholesky module
*
*
*
* This module provides two variants of the Cholesky decomposition for selfadjoint (hermitian) matrices.
* Those decompositions are also accessible via the following methods:
* - MatrixBase::llt()
* - MatrixBase::ldlt()
* - SelfAdjointView::llt()
* - SelfAdjointView::ldlt()
*
* \code
* #include <Eigen/Cholesky>
* \endcode
*/
*
* This module provides two variants of the Cholesky decomposition for selfadjoint (hermitian) matrices.
* Those decompositions are also accessible via the following methods:
* - MatrixBase::llt()
* - MatrixBase::ldlt()
* - SelfAdjointView::llt()
* - SelfAdjointView::ldlt()
*
* \code
* #include <Eigen/Cholesky>
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/Cholesky/LLT.h"
#include "src/Cholesky/LDLT.h"
#ifdef EIGEN_USE_LAPACKE
#include "src/misc/lapacke.h"
#include "src/misc/lapacke_helpers.h"
#include "src/Cholesky/LLT_LAPACKE.h"
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CHOLESKY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_CHOLESKY_MODULE_H

View File

@@ -12,37 +12,37 @@
#include "src/Core/util/DisableStupidWarnings.h"
extern "C" {
#include <cholmod.h>
}
#include <cholmod.h>
/** \ingroup Support_modules
* \defgroup CholmodSupport_Module CholmodSupport module
*
* This module provides an interface to the Cholmod library which is part of the <a href="http://www.suitesparse.com">suitesparse</a> package.
* It provides the two following main factorization classes:
* - class CholmodSupernodalLLT: a supernodal LLT Cholesky factorization.
* - class CholmodDecomposiiton: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of the underlying factorization method (supernodal or simplicial).
*
* For the sake of completeness, this module also propose the two following classes:
* - class CholmodSimplicialLLT
* - class CholmodSimplicialLDLT
* Note that these classes does not bring any particular advantage compared to the built-in
* SimplicialLLT and SimplicialLDLT factorization classes.
*
* \code
* #include <Eigen/CholmodSupport>
* \endcode
*
* In order to use this module, the cholmod headers must be accessible from the include paths, and your binary must be linked to the cholmod library and its dependencies.
* The dependencies depend on how cholmod has been compiled.
* For a cmake based project, you can use our FindCholmod.cmake module to help you in this task.
*
*/
* \defgroup CholmodSupport_Module CholmodSupport module
*
* This module provides an interface to the Cholmod library which is part of the <a
* href="http://www.suitesparse.com">suitesparse</a> package. It provides the two following main factorization classes:
* - class CholmodSupernodalLLT: a supernodal LLT Cholesky factorization.
* - class CholmodDecomposition: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of
* the underlying factorization method (supernodal or simplicial).
*
* For the sake of completeness, this module also propose the two following classes:
* - class CholmodSimplicialLLT
* - class CholmodSimplicialLDLT
* Note that these classes do not bring any particular advantage compared to the built-in
* SimplicialLLT and SimplicialLDLT factorization classes.
*
* \code
* #include <Eigen/CholmodSupport>
* \endcode
*
* In order to use this module, the cholmod headers must be accessible from the include paths, and your binary must be
* linked to the cholmod library and its dependencies. The dependencies depend on how cholmod has been compiled. For a
* cmake based project, you can use our FindCholmod.cmake module to help you in this task.
*
*/
// IWYU pragma: begin_exports
#include "src/CholmodSupport/CholmodSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CHOLMODSUPPORT_MODULE_H
#endif // EIGEN_CHOLMODSUPPORT_MODULE_H

View File

@@ -8,247 +8,66 @@
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_CORE_H
#define EIGEN_CORE_H
#ifndef EIGEN_CORE_MODULE_H
#define EIGEN_CORE_MODULE_H
// first thing Eigen does: stop the compiler from committing suicide
// Eigen version information.
#include "Version"
// first thing Eigen does: stop the compiler from reporting useless warnings.
#include "src/Core/util/DisableStupidWarnings.h"
// Handle NVCC/CUDA/SYCL
#if defined(__CUDACC__) || defined(__SYCL_DEVICE_ONLY__)
// Do not try asserts on CUDA and SYCL!
#ifndef EIGEN_NO_DEBUG
#define EIGEN_NO_DEBUG
#endif
// then include this file where all our macros are defined. It's really important to do it first because
// it's where we do all the compiler/OS/arch detections and define most defaults.
#include "src/Core/util/Macros.h"
#ifdef EIGEN_INTERNAL_DEBUGGING
#undef EIGEN_INTERNAL_DEBUGGING
#endif
// This detects SSE/AVX/NEON/etc. and configure alignment settings
#include "src/Core/util/ConfigureVectorization.h"
#ifdef EIGEN_EXCEPTIONS
#undef EIGEN_EXCEPTIONS
#endif
// All functions callable from CUDA code must be qualified with __device__
#ifdef __CUDACC__
// Do not try to vectorize on CUDA and SYCL!
#ifndef EIGEN_DONT_VECTORIZE
#define EIGEN_DONT_VECTORIZE
#endif
#define EIGEN_DEVICE_FUNC __host__ __device__
// We need math_functions.hpp to ensure that that EIGEN_USING_STD_MATH macro
// works properly on the device side
#include <math_functions.hpp>
#else
#define EIGEN_DEVICE_FUNC
#endif
#else
#define EIGEN_DEVICE_FUNC
#endif
// When compiling CUDA device code with NVCC, pull in math functions from the
// global namespace. In host mode, and when device doee with clang, use the
// std versions.
#if defined(__CUDA_ARCH__) && defined(__NVCC__)
#define EIGEN_USING_STD_MATH(FUNC) using ::FUNC;
#else
#define EIGEN_USING_STD_MATH(FUNC) using std::FUNC;
#endif
#if (defined(_CPPUNWIND) || defined(__EXCEPTIONS)) && !defined(__CUDA_ARCH__) && !defined(EIGEN_EXCEPTIONS) && !defined(EIGEN_USE_SYCL)
#define EIGEN_EXCEPTIONS
// We need cuda_runtime.h/hip_runtime.h to ensure that
// the EIGEN_USING_STD macro works properly on the device side
#if defined(EIGEN_CUDACC)
#include <cuda_runtime.h>
#elif defined(EIGEN_HIPCC)
#include <hip/hip_runtime.h>
#endif
#ifdef EIGEN_EXCEPTIONS
#include <new>
#include <new>
#endif
// then include this file where all our macros are defined. It's really important to do it first because
// it's where we do all the alignment settings (platform detection and honoring the user's will if he
// defined e.g. EIGEN_DONT_ALIGN) so it needs to be done before we do anything with vectorization.
#include "src/Core/util/Macros.h"
// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
#pragma GCC optimize ("-fno-ipa-cp-clone")
// Prevent ICC from specializing std::complex operators that silently fail
// on device. This allows us to use our own device-compatible specializations
// instead.
#if EIGEN_COMP_ICC && defined(EIGEN_GPU_COMPILE_PHASE) && !defined(_OVERRIDE_COMPLEX_SPECIALIZATION_)
#define _OVERRIDE_COMPLEX_SPECIALIZATION_ 1
#endif
#include <complex>
// this include file manages BLAS and MKL related macros
// and inclusion of their respective header files
#include "src/Core/util/MKL_support.h"
#include "src/Core/util/AOCL_Support.h"
// if alignment is disabled, then disable vectorization. Note: EIGEN_MAX_ALIGN_BYTES is the proper check, it takes into
// account both the user's will (EIGEN_MAX_ALIGN_BYTES,EIGEN_DONT_ALIGN) and our own platform checks
#if EIGEN_MAX_ALIGN_BYTES==0
#ifndef EIGEN_DONT_VECTORIZE
#define EIGEN_DONT_VECTORIZE
#endif
#endif
#if EIGEN_COMP_MSVC
#include <malloc.h> // for _aligned_malloc -- need it regardless of whether vectorization is enabled
#if (EIGEN_COMP_MSVC >= 1500) // 2008 or later
// Remember that usage of defined() in a #define is undefined by the standard.
// a user reported that in 64-bit mode, MSVC doesn't care to define _M_IX86_FP.
#if (defined(_M_IX86_FP) && (_M_IX86_FP >= 2)) || EIGEN_ARCH_x86_64
#define EIGEN_SSE2_ON_MSVC_2008_OR_LATER
#endif
#endif
#else
// Remember that usage of defined() in a #define is undefined by the standard
#if (defined __SSE2__) && ( (!EIGEN_COMP_GNUC) || EIGEN_COMP_ICC || EIGEN_GNUC_AT_LEAST(4,2) )
#define EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC
#endif
#endif
// EIGEN_HAS_GPU_FP16 is now always true when compiling with CUDA or HIP.
// Use EIGEN_GPUCC (compile-time) or EIGEN_GPU_COMPILE_PHASE (device phase) instead.
// TODO: Remove EIGEN_HAS_GPU_BF16 similarly once HIP bf16 guards are cleaned up.
#ifndef EIGEN_DONT_VECTORIZE
#if defined (EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC) || defined(EIGEN_SSE2_ON_MSVC_2008_OR_LATER)
// Defines symbols for compile-time detection of which instructions are
// used.
// EIGEN_VECTORIZE_YY is defined if and only if the instruction set YY is used
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_SSE
#define EIGEN_VECTORIZE_SSE2
// Detect sse3/ssse3/sse4:
// gcc and icc defines __SSE3__, ...
// there is no way to know about this on msvc. You can define EIGEN_VECTORIZE_SSE* if you
// want to force the use of those instructions with msvc.
#ifdef __SSE3__
#define EIGEN_VECTORIZE_SSE3
#endif
#ifdef __SSSE3__
#define EIGEN_VECTORIZE_SSSE3
#endif
#ifdef __SSE4_1__
#define EIGEN_VECTORIZE_SSE4_1
#endif
#ifdef __SSE4_2__
#define EIGEN_VECTORIZE_SSE4_2
#endif
#ifdef __AVX__
#define EIGEN_VECTORIZE_AVX
#define EIGEN_VECTORIZE_SSE3
#define EIGEN_VECTORIZE_SSSE3
#define EIGEN_VECTORIZE_SSE4_1
#define EIGEN_VECTORIZE_SSE4_2
#endif
#ifdef __AVX2__
#define EIGEN_VECTORIZE_AVX2
#endif
#ifdef __FMA__
#define EIGEN_VECTORIZE_FMA
#endif
#if defined(__AVX512F__) && defined(EIGEN_ENABLE_AVX512)
#define EIGEN_VECTORIZE_AVX512
#define EIGEN_VECTORIZE_AVX2
#define EIGEN_VECTORIZE_AVX
#define EIGEN_VECTORIZE_FMA
#ifdef __AVX512DQ__
#define EIGEN_VECTORIZE_AVX512DQ
#endif
#endif
// include files
// This extern "C" works around a MINGW-w64 compilation issue
// https://sourceforge.net/tracker/index.php?func=detail&aid=3018394&group_id=202880&atid=983354
// In essence, intrin.h is included by windows.h and also declares intrinsics (just as emmintrin.h etc. below do).
// However, intrin.h uses an extern "C" declaration, and g++ thus complains of duplicate declarations
// with conflicting linkage. The linkage for intrinsics doesn't matter, but at that stage the compiler doesn't know;
// so, to avoid compile errors when windows.h is included after Eigen/Core, ensure intrinsics are extern "C" here too.
// notice that since these are C headers, the extern "C" is theoretically needed anyways.
extern "C" {
// In theory we should only include immintrin.h and not the other *mmintrin.h header files directly.
// Doing so triggers some issues with ICC. However old gcc versions seems to not have this file, thus:
#if EIGEN_COMP_ICC >= 1110
#include <immintrin.h>
#else
#include <mmintrin.h>
#include <emmintrin.h>
#include <xmmintrin.h>
#ifdef EIGEN_VECTORIZE_SSE3
#include <pmmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSSE3
#include <tmmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSE4_1
#include <smmintrin.h>
#endif
#ifdef EIGEN_VECTORIZE_SSE4_2
#include <nmmintrin.h>
#endif
#if defined(EIGEN_VECTORIZE_AVX) || defined(EIGEN_VECTORIZE_AVX512)
#include <immintrin.h>
#endif
#endif
} // end extern "C"
#elif defined __VSX__
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_VSX
#include <altivec.h>
// We need to #undef all these ugly tokens defined in <altivec.h>
// => use __vector instead of vector
#undef bool
#undef vector
#undef pixel
#elif defined __ALTIVEC__
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_ALTIVEC
#include <altivec.h>
// We need to #undef all these ugly tokens defined in <altivec.h>
// => use __vector instead of vector
#undef bool
#undef vector
#undef pixel
#elif (defined __ARM_NEON) || (defined __ARM_NEON__)
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_NEON
#include <arm_neon.h>
#elif (defined __s390x__ && defined __VEC__)
#define EIGEN_VECTORIZE
#define EIGEN_VECTORIZE_ZVECTOR
#include <vecintrin.h>
#endif
#endif
#if defined(__F16C__) && !defined(EIGEN_COMP_CLANG)
// We can use the optimized fp16 to float and float to fp16 conversion routines
#define EIGEN_HAS_FP16_C
#endif
#if defined __CUDACC__
#define EIGEN_VECTORIZE_CUDA
#include <vector_types.h>
#if defined __CUDACC_VER__ && __CUDACC_VER__ >= 70500
#define EIGEN_HAS_CUDA_FP16
#endif
#endif
#if defined EIGEN_HAS_CUDA_FP16
#include <host_defines.h>
#include <cuda_fp16.h>
#if defined(EIGEN_HAS_CUDA_BF16) || defined(EIGEN_HAS_HIP_BF16)
#define EIGEN_HAS_GPU_BF16
#endif
#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
#define EIGEN_HAS_OPENMP
#define EIGEN_HAS_OPENMP
#endif
#ifdef EIGEN_HAS_OPENMP
#include <atomic>
#include <omp.h>
#endif
// MSVC for windows mobile does not have the errno.h file
#if !(EIGEN_COMP_MSVC && EIGEN_OS_WINCE) && !EIGEN_COMP_ARM
#if !EIGEN_COMP_ARM
#define EIGEN_HAS_ERRNO
#endif
@@ -258,19 +77,38 @@
#include <cstddef>
#include <cstdlib>
#include <cmath>
#include <cassert>
#include <functional>
#ifndef EIGEN_NO_IO
#include <sstream>
#include <iosfwd>
#endif
#include <cstring>
#include <string>
#include <limits>
#include <climits> // for CHAR_BIT
#include <climits> // for CHAR_BIT
// for min/max:
#include <algorithm>
#include <array>
#include <memory>
#include <vector>
// for std::is_nothrow_move_assignable
#ifdef EIGEN_INCLUDE_TYPE_TRAITS
#include <type_traits>
// for std::this_thread::yield().
#if !defined(EIGEN_USE_BLAS) && (defined(EIGEN_HAS_OPENMP) || defined(EIGEN_GEMM_THREADPOOL))
#include <thread>
#endif
// for __cpp_lib feature test macros
#if defined(__has_include) && __has_include(<version>)
#include <version>
#endif
// for std::bit_cast()
#if defined(__cpp_lib_bit_cast) && __cpp_lib_bit_cast >= 201806L
#include <bit>
#endif
// for outputting debug info
@@ -279,120 +117,205 @@
#endif
// required for __cpuid, needs to be included after cmath
#if EIGEN_COMP_MSVC && EIGEN_ARCH_i386_OR_x86_64 && !EIGEN_OS_WINCE
#include <intrin.h>
// also required for _BitScanReverse on Windows on ARM
#if EIGEN_COMP_MSVC && (EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_ARM64)
#include <intrin.h>
#endif
// Required for querying cache sizes on Linux and macOS.
#if EIGEN_OS_LINUX
#include <unistd.h>
#elif EIGEN_OS_MAC
#include <sys/types.h>
#include <sys/sysctl.h>
#endif
#if defined(EIGEN_USE_SYCL)
#undef min
#undef max
#undef isnan
#undef isinf
#undef isfinite
#include <CL/sycl.hpp>
#include <map>
#include <thread>
#include <utility>
#ifndef EIGEN_SYCL_LOCAL_THREAD_DIM0
#define EIGEN_SYCL_LOCAL_THREAD_DIM0 16
#endif
#ifndef EIGEN_SYCL_LOCAL_THREAD_DIM1
#define EIGEN_SYCL_LOCAL_THREAD_DIM1 16
#endif
#endif
/** \brief Namespace containing all symbols from the %Eigen library. */
namespace Eigen {
inline static const char *SimdInstructionSetsInUse(void) {
#if defined(EIGEN_VECTORIZE_AVX512)
return "AVX512, FMA, AVX2, AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_AVX)
return "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_SSE4_2)
return "SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
#elif defined(EIGEN_VECTORIZE_SSE4_1)
return "SSE, SSE2, SSE3, SSSE3, SSE4.1";
#elif defined(EIGEN_VECTORIZE_SSSE3)
return "SSE, SSE2, SSE3, SSSE3";
#elif defined(EIGEN_VECTORIZE_SSE3)
return "SSE, SSE2, SSE3";
#elif defined(EIGEN_VECTORIZE_SSE2)
return "SSE, SSE2";
#elif defined(EIGEN_VECTORIZE_ALTIVEC)
return "AltiVec";
#elif defined(EIGEN_VECTORIZE_VSX)
return "VSX";
#elif defined(EIGEN_VECTORIZE_NEON)
return "ARM NEON";
#elif defined(EIGEN_VECTORIZE_ZVECTOR)
return "S390X ZVECTOR";
#else
return "None";
#endif
}
} // end namespace Eigen
#if defined EIGEN2_SUPPORT_STAGE40_FULL_EIGEN3_STRICTNESS || defined EIGEN2_SUPPORT_STAGE30_FULL_EIGEN3_API || defined EIGEN2_SUPPORT_STAGE20_RESOLVE_API_CONFLICTS || defined EIGEN2_SUPPORT_STAGE10_FULL_EIGEN2_API || defined EIGEN2_SUPPORT
// This will generate an error message:
#error Eigen2-support is only available up to version 3.2. Please go to "http://eigen.tuxfamily.org/index.php?title=Eigen2" for further information
#endif
// we use size_t frequently and we'll never remember to prepend it with std:: everytime just to
// ensure QNX/QCC support
using std::size_t;
// gcc 4.6.0 wants std:: for ptrdiff_t
using std::ptrdiff_t;
/** \defgroup Core_Module Core module
* This is the main module of Eigen providing dense matrix and vector support
* (both fixed and dynamic size) with all the features corresponding to a BLAS library
* and much more...
*
* \code
* #include <Eigen/Core>
* \endcode
*/
} // namespace Eigen
/** \defgroup Core_Module Core module
* This is the main module of Eigen providing dense matrix and vector support
* (both fixed and dynamic size) with all the features corresponding to a BLAS library
* and much more...
*
* \code
* #include <Eigen/Core>
* \endcode
*/
#ifdef EIGEN_USE_LAPACKE
#ifdef EIGEN_USE_MKL
#include "mkl_lapacke.h"
#elif defined(EIGEN_LAPACKE_SYSTEM)
#include <lapacke.h>
#else
#include "src/misc/lapacke.h"
#endif
#endif
// IWYU pragma: begin_exports
#include "src/Core/util/Constants.h"
#include "src/Core/util/Meta.h"
#include "src/Core/util/Assert.h"
#include "src/Core/util/ForwardDeclarations.h"
#include "src/Core/util/StaticAssert.h"
#include "src/Core/util/XprHelper.h"
#include "src/Core/util/Memory.h"
#include "src/Core/util/IntegralConstant.h"
#include "src/Core/util/Serializer.h"
#include "src/Core/util/SymbolicIndex.h"
#include "src/Core/util/EmulateArray.h"
#include "src/Core/util/MoreMeta.h"
#include "src/Core/NumTraits.h"
#include "src/Core/MathFunctions.h"
#include "src/Core/RandomImpl.h"
#include "src/Core/GenericPacketMath.h"
#include "src/Core/MathFunctionsImpl.h"
#include "src/Core/arch/Default/ConjHelper.h"
// Generic half float support
#include "src/Core/arch/Default/Half.h"
#include "src/Core/arch/Default/BFloat16.h"
#include "src/Core/arch/Default/GenericPacketMathFunctionsFwd.h"
#if defined(EIGEN_VECTORIZE_GENERIC) && !defined(EIGEN_DONT_VECTORIZE)
#include "src/Core/arch/clang/PacketMath.h"
#include "src/Core/arch/clang/TypeCasting.h"
#include "src/Core/arch/clang/Complex.h"
#include "src/Core/arch/clang/Reductions.h"
#include "src/Core/arch/clang/MathFunctions.h"
#else
#if defined EIGEN_VECTORIZE_AVX512
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX512/PacketMath.h"
#include "src/Core/arch/AVX512/MathFunctions.h"
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/Reductions.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/Reductions.h"
#include "src/Core/arch/AVX512/PacketMath.h"
#include "src/Core/arch/AVX512/Reductions.h"
#if defined EIGEN_VECTORIZE_AVX512FP16
#include "src/Core/arch/AVX512/PacketMathFP16.h"
#endif
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/AVX/TypeCasting.h"
#include "src/Core/arch/AVX512/TypeCasting.h"
#if defined EIGEN_VECTORIZE_AVX512FP16
#include "src/Core/arch/AVX512/TypeCastingFP16.h"
#endif
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/AVX512/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#include "src/Core/arch/AVX512/MathFunctions.h"
#if defined EIGEN_VECTORIZE_AVX512FP16
#include "src/Core/arch/AVX512/MathFunctionsFP16.h"
#endif
#include "src/Core/arch/AVX512/TrsmKernel.h"
#elif defined EIGEN_VECTORIZE_AVX
// Use AVX for floats and doubles, SSE for integers
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/AVX/TypeCasting.h"
// Use AVX for floats and doubles, SSE for integers
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/Reductions.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/AVX/PacketMath.h"
#include "src/Core/arch/AVX/Reductions.h"
#include "src/Core/arch/AVX/TypeCasting.h"
#include "src/Core/arch/AVX/Complex.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/AVX/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_SSE
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/SSE/Complex.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#elif defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/PacketMath.h"
#include "src/Core/arch/AltiVec/MathFunctions.h"
#include "src/Core/arch/AltiVec/Complex.h"
#include "src/Core/arch/SSE/PacketMath.h"
#include "src/Core/arch/SSE/Reductions.h"
#include "src/Core/arch/SSE/TypeCasting.h"
#include "src/Core/arch/SSE/MathFunctions.h"
#include "src/Core/arch/SSE/Complex.h"
#endif
#if defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/PacketMath.h"
#include "src/Core/arch/AltiVec/TypeCasting.h"
#include "src/Core/arch/AltiVec/MathFunctions.h"
#include "src/Core/arch/AltiVec/Complex.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/PacketMath.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#include "src/Core/arch/NEON/PacketMath.h"
#include "src/Core/arch/NEON/TypeCasting.h"
#include "src/Core/arch/NEON/MathFunctions.h"
#include "src/Core/arch/NEON/Complex.h"
#elif defined EIGEN_VECTORIZE_LSX
#include "src/Core/arch/LSX/PacketMath.h"
#include "src/Core/arch/LSX/TypeCasting.h"
#include "src/Core/arch/LSX/MathFunctions.h"
#include "src/Core/arch/LSX/Complex.h"
#elif defined EIGEN_VECTORIZE_SVE
#include "src/Core/arch/SVE/PacketMath.h"
#include "src/Core/arch/SVE/TypeCasting.h"
#include "src/Core/arch/SVE/MathFunctions.h"
#elif defined EIGEN_VECTORIZE_RVV10
#include "src/Core/arch/RVV10/PacketMath.h"
#include "src/Core/arch/RVV10/PacketMath4.h"
#include "src/Core/arch/RVV10/PacketMath2.h"
#include "src/Core/arch/RVV10/TypeCasting.h"
#include "src/Core/arch/RVV10/MathFunctions.h"
#if defined EIGEN_VECTORIZE_RVV10FP16
#include "src/Core/arch/RVV10/PacketMathFP16.h"
#endif
#if defined EIGEN_VECTORIZE_RVV10BF16
#include "src/Core/arch/RVV10/PacketMathBF16.h"
#endif
#elif defined EIGEN_VECTORIZE_ZVECTOR
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
#include "src/Core/arch/ZVector/Complex.h"
#include "src/Core/arch/ZVector/PacketMath.h"
#include "src/Core/arch/ZVector/MathFunctions.h"
#include "src/Core/arch/ZVector/Complex.h"
#elif defined EIGEN_VECTORIZE_MSA
#include "src/Core/arch/MSA/PacketMath.h"
#include "src/Core/arch/MSA/MathFunctions.h"
#include "src/Core/arch/MSA/Complex.h"
#elif defined EIGEN_VECTORIZE_HVX
#include "src/Core/arch/HVX/PacketMath.h"
#endif
// Half float support
#include "src/Core/arch/CUDA/Half.h"
#include "src/Core/arch/CUDA/PacketMathHalf.h"
#include "src/Core/arch/CUDA/TypeCasting.h"
#if defined EIGEN_VECTORIZE_CUDA
#include "src/Core/arch/CUDA/PacketMath.h"
#include "src/Core/arch/CUDA/MathFunctions.h"
#if defined EIGEN_VECTORIZE_GPU
#include "src/Core/arch/GPU/PacketMath.h"
#include "src/Core/arch/GPU/MathFunctions.h"
#include "src/Core/arch/GPU/TypeCasting.h"
#endif
#if defined(EIGEN_USE_SYCL)
#include "src/Core/arch/SYCL/InteropHeaders.h"
#if !defined(EIGEN_DONT_VECTORIZE_SYCL)
#include "src/Core/arch/SYCL/PacketMath.h"
#include "src/Core/arch/SYCL/MathFunctions.h"
#include "src/Core/arch/SYCL/TypeCasting.h"
#endif
#endif
#endif // #ifndef EIGEN_VECTORIZE_GENERIC
#include "src/Core/arch/Default/Settings.h"
// This file provides generic implementations valid for scalar as well
#include "src/Core/arch/Default/GenericPacketMathFunctions.h"
#include "src/Core/functors/TernaryFunctors.h"
#include "src/Core/functors/BinaryFunctors.h"
@@ -401,10 +324,22 @@ using std::ptrdiff_t;
#include "src/Core/functors/StlFunctors.h"
#include "src/Core/functors/AssignmentFunctors.h"
// Specialized functors to enable the processing of complex numbers
// on CUDA devices
#include "src/Core/arch/CUDA/Complex.h"
// Specialized functors for GPU.
#ifdef EIGEN_GPUCC
#include "src/Core/arch/GPU/Complex.h"
#endif
// Specializations of vectorized activation functions for NEON.
#ifdef EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/UnaryFunctors.h"
#endif
#include "src/Core/util/IndexedViewHelper.h"
#include "src/Core/util/ReshapedHelper.h"
#include "src/Core/ArithmeticSequence.h"
#ifndef EIGEN_NO_IO
#include "src/Core/IO.h"
#endif
#include "src/Core/DenseCoeffsBase.h"
#include "src/Core/DenseBase.h"
#include "src/Core/MatrixBase.h"
@@ -413,30 +348,27 @@ using std::ptrdiff_t;
#include "src/Core/Product.h"
#include "src/Core/CoreEvaluators.h"
#include "src/Core/AssignEvaluator.h"
#ifndef EIGEN_PARSED_BY_DOXYGEN // work around Doxygen bug triggered by Assign.h r814874
// at least confirmed with Doxygen 1.5.5 and 1.5.6
#include "src/Core/Assign.h"
#endif
#include "src/Core/RealView.h"
#include "src/Core/Assign.h"
#include "src/Core/ArrayBase.h"
#include "src/Core/util/BlasUtil.h"
#include "src/Core/DenseStorage.h"
#include "src/Core/NestByValue.h"
// #include "src/Core/ForceAlignedAccess.h"
#include "src/Core/ReturnByValue.h"
#include "src/Core/NoAlias.h"
#include "src/Core/PlainObjectBase.h"
#include "src/Core/Matrix.h"
#include "src/Core/Array.h"
#include "src/Core/Fill.h"
#include "src/Core/CwiseTernaryOp.h"
#include "src/Core/CwiseBinaryOp.h"
#include "src/Core/CwiseUnaryOp.h"
#include "src/Core/CwiseNullaryOp.h"
#include "src/Core/CwiseUnaryView.h"
#include "src/Core/SelfCwiseBinaryOp.h"
#include "src/Core/InnerProduct.h"
#include "src/Core/Dot.h"
#include "src/Core/StableNorm.h"
#include "src/Core/Stride.h"
@@ -445,14 +377,17 @@ using std::ptrdiff_t;
#include "src/Core/Ref.h"
#include "src/Core/Block.h"
#include "src/Core/VectorBlock.h"
#include "src/Core/IndexedView.h"
#include "src/Core/Reshaped.h"
#include "src/Core/Transpose.h"
#include "src/Core/DiagonalMatrix.h"
#include "src/Core/Diagonal.h"
#include "src/Core/DiagonalProduct.h"
#include "src/Core/SkewSymmetricMatrix3.h"
#include "src/Core/Redux.h"
#include "src/Core/Visitor.h"
#include "src/Core/FindCoeff.h"
#include "src/Core/Fuzzy.h"
#include "src/Core/IO.h"
#include "src/Core/Swap.h"
#include "src/Core/CommaInitializer.h"
#include "src/Core/GeneralProduct.h"
@@ -464,6 +399,10 @@ using std::ptrdiff_t;
#include "src/Core/TriangularMatrix.h"
#include "src/Core/SelfAdjointView.h"
#include "src/Core/products/GeneralBlockPanelKernel.h"
#include "src/Core/DeviceWrapper.h"
#ifdef EIGEN_GEMM_THREADPOOL
#include "ThreadPool"
#endif
#include "src/Core/products/Parallelizer.h"
#include "src/Core/ProductEvaluators.h"
#include "src/Core/products/GeneralMatrixVector.h"
@@ -482,13 +421,30 @@ using std::ptrdiff_t;
#include "src/Core/CoreIterators.h"
#include "src/Core/ConditionEstimator.h"
#include "src/Core/BooleanRedux.h"
#if !defined(EIGEN_VECTORIZE_GENERIC)
#if defined(EIGEN_VECTORIZE_VSX)
#include "src/Core/arch/AltiVec/MatrixProduct.h"
#elif defined EIGEN_VECTORIZE_NEON
#include "src/Core/arch/NEON/GeneralBlockPanelKernel.h"
#elif defined EIGEN_VECTORIZE_LSX
#include "src/Core/arch/LSX/GeneralBlockPanelKernel.h"
#elif defined EIGEN_VECTORIZE_RVV10
#include "src/Core/arch/RVV10/GeneralBlockPanelKernel.h"
#endif
#if defined(EIGEN_VECTORIZE_AVX512)
#include "src/Core/arch/AVX512/GemmKernel.h"
#endif
#endif
#include "src/Core/Select.h"
#include "src/Core/VectorwiseOp.h"
#include "src/Core/PartialReduxEvaluator.h"
#include "src/Core/Random.h"
#include "src/Core/Replicate.h"
#include "src/Core/Reverse.h"
#include "src/Core/ArrayWrapper.h"
#include "src/Core/StlIterators.h"
#ifdef EIGEN_USE_BLAS
#include "src/Core/products/GeneralMatrixMatrix_BLAS.h"
@@ -499,14 +455,19 @@ using std::ptrdiff_t;
#include "src/Core/products/TriangularMatrixMatrix_BLAS.h"
#include "src/Core/products/TriangularMatrixVector_BLAS.h"
#include "src/Core/products/TriangularSolverMatrix_BLAS.h"
#endif // EIGEN_USE_BLAS
#endif // EIGEN_USE_BLAS
#ifdef EIGEN_USE_MKL_VML
#include "src/Core/Assign_MKL.h"
#endif
#ifdef EIGEN_USE_AOCL_VML
#include "src/Core/Assign_AOCL.h"
#endif
#include "src/Core/GlobalFunctions.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_CORE_H
#endif // EIGEN_CORE_MODULE_H

View File

@@ -1,3 +1,13 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_DENSE_MODULE_H
#define EIGEN_DENSE_MODULE_H
#include "Core"
#include "LU"
#include "Cholesky"
@@ -5,3 +15,5 @@
#include "SVD"
#include "Geometry"
#include "Eigenvalues"
#endif // EIGEN_DENSE_MODULE_H

View File

@@ -1,2 +1,14 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_EIGEN_MODULE_H
#define EIGEN_EIGEN_MODULE_H
#include "Dense"
#include "Sparse"
#endif // EIGEN_EIGEN_MODULE_H

View File

@@ -10,29 +10,26 @@
#include "Core"
#include "src/Core/util/DisableStupidWarnings.h"
#include "Cholesky"
#include "Jacobi"
#include "Householder"
#include "LU"
#include "Geometry"
#include "Sparse" // Needed by ComplexQZ.
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup Eigenvalues_Module Eigenvalues module
*
*
*
* This module mainly provides various eigenvalue solvers.
* This module also provides some MatrixBase methods, including:
* - MatrixBase::eigenvalues(),
* - MatrixBase::operatorNorm()
*
* \code
* #include <Eigen/Eigenvalues>
* \endcode
*/
*
* This module mainly provides various eigenvalue solvers.
* This module also provides some MatrixBase methods, including:
* - MatrixBase::eigenvalues(),
* - MatrixBase::operatorNorm()
*
* \code
* #include <Eigen/Eigenvalues>
* \endcode
*/
#include "src/misc/RealSvd2x2.h"
// IWYU pragma: begin_exports
#include "src/Eigenvalues/Tridiagonalization.h"
#include "src/Eigenvalues/RealSchur.h"
#include "src/Eigenvalues/EigenSolver.h"
@@ -42,16 +39,23 @@
#include "src/Eigenvalues/ComplexSchur.h"
#include "src/Eigenvalues/ComplexEigenSolver.h"
#include "src/Eigenvalues/RealQZ.h"
#include "src/Eigenvalues/ComplexQZ.h"
#include "src/Eigenvalues/GeneralizedEigenSolver.h"
#include "src/Eigenvalues/MatrixBaseEigenvalues.h"
#ifdef EIGEN_USE_LAPACKE
#ifdef EIGEN_USE_MKL
#include "mkl_lapacke.h"
#elif defined(EIGEN_LAPACKE_SYSTEM)
#include <lapacke.h>
#else
#include "src/misc/lapacke.h"
#endif
#include "src/Eigenvalues/RealSchur_LAPACKE.h"
#include "src/Eigenvalues/ComplexSchur_LAPACKE.h"
#include "src/Eigenvalues/SelfAdjointEigenSolver_LAPACKE.h"
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_EIGENVALUES_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_EIGENVALUES_MODULE_H

View File

@@ -10,32 +10,30 @@
#include "Core"
#include "src/Core/util/DisableStupidWarnings.h"
#include "SVD"
#include "LU"
#include <limits>
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup Geometry_Module Geometry module
*
* This module provides support for:
* - fixed-size homogeneous transformations
* - translation, scaling, 2D and 3D rotations
* - \link Quaternion quaternions \endlink
* - cross products (\ref MatrixBase::cross, \ref MatrixBase::cross3)
* - orthognal vector generation (\ref MatrixBase::unitOrthogonal)
* - some linear components: \link ParametrizedLine parametrized-lines \endlink and \link Hyperplane hyperplanes \endlink
* - \link AlignedBox axis aligned bounding boxes \endlink
* - \link umeyama least-square transformation fitting \endlink
*
* \code
* #include <Eigen/Geometry>
* \endcode
*/
*
* This module provides support for:
* - fixed-size homogeneous transformations
* - translation, scaling, 2D and 3D rotations
* - \link Quaternion quaternions \endlink
* - cross products (\ref MatrixBase::cross(), \ref MatrixBase::cross3())
* - orthogonal vector generation (MatrixBase::unitOrthogonal)
* - some linear components: \link ParametrizedLine parametrized-lines \endlink and \link Hyperplane hyperplanes \endlink
* - \link AlignedBox axis aligned bounding boxes \endlink
* - \link umeyama() least-square transformation fitting \endlink
* \code
* #include <Eigen/Geometry>
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/Geometry/OrthoMethods.h"
#include "src/Geometry/EulerAngles.h"
#include "src/Geometry/Homogeneous.h"
#include "src/Geometry/RotationBase.h"
#include "src/Geometry/Rotation2D.h"
@@ -49,14 +47,15 @@
#include "src/Geometry/AlignedBox.h"
#include "src/Geometry/Umeyama.h"
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
#include "src/Geometry/arch/Geometry_SSE.h"
#ifndef EIGEN_VECTORIZE_GENERIC
// TODO(rmlarsen): Make these work with generic vectorization if possible.
// Use the SSE optimized version whenever possible.
#if (defined EIGEN_VECTORIZE_SSE) || (defined EIGEN_VECTORIZE_NEON)
#include "src/Geometry/arch/Geometry_SIMD.h"
#endif
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_GEOMETRY_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_GEOMETRY_MODULE_H

View File

@@ -13,18 +13,19 @@
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup Householder_Module Householder module
* This module provides Householder transformations.
*
* \code
* #include <Eigen/Householder>
* \endcode
*/
* This module provides Householder transformations.
*
* \code
* #include <Eigen/Householder>
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/Householder/Householder.h"
#include "src/Householder/HouseholderSequence.h"
#include "src/Householder/BlockHouseholder.h"
#include "src/Householder/HouseholderSequence.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_HOUSEHOLDER_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_HOUSEHOLDER_MODULE_H

View File

@@ -13,10 +13,11 @@
#include "src/Core/util/DisableStupidWarnings.h"
/**
/**
* \defgroup IterativeLinearSolvers_Module IterativeLinearSolvers module
*
* This module currently provides iterative methods to solve problems of the form \c A \c x = \c b, where \c A is a squared matrix, usually very large and sparse.
* This module currently provides iterative methods to solve problems of the form \c A \c x = \c b, where \c A is a
squared matrix, usually very large and sparse.
* Those solvers are accessible via the following classes:
* - ConjugateGradient for selfadjoint (hermitian) matrices,
* - LeastSquaresConjugateGradient for rectangular least-square problems,
@@ -27,13 +28,15 @@
* - DiagonalPreconditioner - also called Jacobi preconditioner, work very well on diagonal dominant matrices.
* - IncompleteLUT - incomplete LU factorization with dual thresholding
*
* Such problems can also be solved using the direct sparse decomposition modules: SparseCholesky, CholmodSupport, UmfPackSupport, SuperLUSupport.
* Such problems can also be solved using the direct sparse decomposition modules: SparseCholesky, CholmodSupport,
UmfPackSupport, SuperLUSupport, AccelerateSupport.
*
\code
#include <Eigen/IterativeLinearSolvers>
\endcode
*/
// IWYU pragma: begin_exports
#include "src/IterativeLinearSolvers/SolveWithGuess.h"
#include "src/IterativeLinearSolvers/IterativeSolverBase.h"
#include "src/IterativeLinearSolvers/BasicPreconditioners.h"
@@ -42,7 +45,8 @@
#include "src/IterativeLinearSolvers/BiCGSTAB.h"
#include "src/IterativeLinearSolvers/IncompleteLUT.h"
#include "src/IterativeLinearSolvers/IncompleteCholesky.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_ITERATIVELINEARSOLVERS_MODULE_H
#endif // EIGEN_ITERATIVELINEARSOLVERS_MODULE_H

View File

@@ -13,21 +13,21 @@
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup Jacobi_Module Jacobi module
* This module provides Jacobi and Givens rotations.
*
* \code
* #include <Eigen/Jacobi>
* \endcode
*
* In addition to listed classes, it defines the two following MatrixBase methods to apply a Jacobi or Givens rotation:
* - MatrixBase::applyOnTheLeft()
* - MatrixBase::applyOnTheRight().
*/
* This module provides Jacobi and Givens rotations.
*
* \code
* #include <Eigen/Jacobi>
* \endcode
*
* In addition to listed classes, it defines the two following MatrixBase methods to apply a Jacobi or Givens rotation:
* - MatrixBase::applyOnTheLeft()
* - MatrixBase::applyOnTheRight().
*/
// IWYU pragma: begin_exports
#include "src/Jacobi/Jacobi.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_JACOBI_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_JACOBI_MODULE_H

43
Eigen/KLUSupport Normal file
View File

@@ -0,0 +1,43 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_KLUSUPPORT_MODULE_H
#define EIGEN_KLUSUPPORT_MODULE_H
#include "SparseCore"
#include "src/Core/util/DisableStupidWarnings.h"
extern "C" {
#include <btf.h>
#include <klu.h>
}
/** \ingroup Support_modules
* \defgroup KLUSupport_Module KLUSupport module
*
* This module provides an interface to the KLU library which is part of the <a
* href="http://www.suitesparse.com">suitesparse</a> package. It provides the following factorization class:
* - class KLU: a sparse LU factorization, well-suited for circuit simulation.
*
* \code
* #include <Eigen/KLUSupport>
* \endcode
*
* In order to use this module, the klu and btf headers must be accessible from the include paths, and your binary must
* be linked to the klu library and its dependencies. The dependencies depend on how KLU has been compiled. For a
* cmake based project, you can use our FindKLU.cmake module to help you in this task.
*
*/
// IWYU pragma: begin_exports
#include "src/KLUSupport/KLUSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_KLUSUPPORT_MODULE_H

View File

@@ -13,34 +13,37 @@
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup LU_Module LU module
* This module includes %LU decomposition and related notions such as matrix inversion and determinant.
* This module defines the following MatrixBase methods:
* - MatrixBase::inverse()
* - MatrixBase::determinant()
*
* \code
* #include <Eigen/LU>
* \endcode
*/
* This module includes %LU decomposition and related notions such as matrix inversion and determinant.
* This module defines the following MatrixBase methods:
* - MatrixBase::inverse()
* - MatrixBase::determinant()
*
* \code
* #include <Eigen/LU>
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/misc/Kernel.h"
#include "src/misc/Image.h"
#include "src/misc/RankRevealingBase.h"
#include "src/LU/FullPivLU.h"
#include "src/LU/PartialPivLU.h"
#ifdef EIGEN_USE_LAPACKE
#include "src/misc/lapacke.h"
#include "src/misc/lapacke_helpers.h"
#include "src/LU/PartialPivLU_LAPACKE.h"
#endif
#include "src/LU/Determinant.h"
#include "src/LU/InverseImpl.h"
// Use the SSE optimized version whenever possible. At the moment the
// SSE version doesn't compile when AVX is enabled
#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
#include "src/LU/arch/Inverse_SSE.h"
#ifndef EIGEN_VECTORIZE_GENERIC
// TODO(rmlarsen): Make these work with generic vectorization if possible.
#if defined EIGEN_VECTORIZE_SSE || defined EIGEN_VECTORIZE_NEON
#include "src/LU/arch/InverseSize4.h"
#endif
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_LU_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_LU_MODULE_H

View File

@@ -16,20 +16,20 @@ extern "C" {
#include <metis.h>
}
/** \ingroup Support_modules
* \defgroup MetisSupport_Module MetisSupport module
*
* \code
* #include <Eigen/MetisSupport>
* \endcode
* This module defines an interface to the METIS reordering package (http://glaros.dtc.umn.edu/gkhome/views/metis).
* It can be used just as any other built-in method as explained in \link OrderingMethods_Module here. \endlink
*/
* \defgroup MetisSupport_Module MetisSupport module
*
* \code
* #include <Eigen/MetisSupport>
* \endcode
* This module defines an interface to the METIS reordering package (http://glaros.dtc.umn.edu/gkhome/views/metis).
* It can be used just as any other built-in method as explained in \link OrderingMethods_Module here. \endlink
*/
// IWYU pragma: begin_exports
#include "src/MetisSupport/MetisSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_METISSUPPORT_MODULE_H
#endif // EIGEN_METISSUPPORT_MODULE_H

View File

@@ -12,62 +12,62 @@
#include "src/Core/util/DisableStupidWarnings.h"
/**
* \defgroup OrderingMethods_Module OrderingMethods module
*
* This module is currently for internal use only
*
* It defines various built-in and external ordering methods for sparse matrices.
* They are typically used to reduce the number of elements during
* the sparse matrix decomposition (LLT, LU, QR).
* Precisely, in a preprocessing step, a permutation matrix P is computed using
* those ordering methods and applied to the columns of the matrix.
* Using for instance the sparse Cholesky decomposition, it is expected that
* the nonzeros elements in LLT(A*P) will be much smaller than that in LLT(A).
*
*
* Usage :
* \code
* #include <Eigen/OrderingMethods>
* \endcode
*
* A simple usage is as a template parameter in the sparse decomposition classes :
*
* \code
* SparseLU<MatrixType, COLAMDOrdering<int> > solver;
* \endcode
*
* \code
* SparseQR<MatrixType, COLAMDOrdering<int> > solver;
* \endcode
*
* It is possible as well to call directly a particular ordering method for your own purpose,
* \code
* AMDOrdering<int> ordering;
* PermutationMatrix<Dynamic, Dynamic, int> perm;
* SparseMatrix<double> A;
* //Fill the matrix ...
*
* ordering(A, perm); // Call AMD
* \endcode
*
* \note Some of these methods (like AMD or METIS), need the sparsity pattern
* of the input matrix to be symmetric. When the matrix is structurally unsymmetric,
* Eigen computes internally the pattern of \f$A^T*A\f$ before calling the method.
* If your matrix is already symmetric (at leat in structure), you can avoid that
* by calling the method with a SelfAdjointView type.
*
* \code
* // Call the ordering on the pattern of the lower triangular matrix A
* ordering(A.selfadjointView<Lower>(), perm);
* \endcode
*/
/**
* \defgroup OrderingMethods_Module OrderingMethods module
*
* This module is currently for internal use only
*
* It defines various built-in and external ordering methods for sparse matrices.
* They are typically used to reduce the number of elements during
* the sparse matrix decomposition (LLT, LU, QR).
* Precisely, in a preprocessing step, a permutation matrix P is computed using
* those ordering methods and applied to the columns of the matrix.
* Using for instance the sparse Cholesky decomposition, it is expected that
* the nonzeros elements in LLT(A*P) will be much smaller than that in LLT(A).
*
*
* Usage :
* \code
* #include <Eigen/OrderingMethods>
* \endcode
*
* A simple usage is as a template parameter in the sparse decomposition classes :
*
* \code
* SparseLU<MatrixType, COLAMDOrdering<int> > solver;
* \endcode
*
* \code
* SparseQR<MatrixType, COLAMDOrdering<int> > solver;
* \endcode
*
* It is possible as well to call directly a particular ordering method for your own purpose,
* \code
* AMDOrdering<int> ordering;
* PermutationMatrix<Dynamic, Dynamic, int> perm;
* SparseMatrix<double> A;
* //Fill the matrix ...
*
* ordering(A, perm); // Call AMD
* \endcode
*
* \note Some of these methods (like AMD or METIS), need the sparsity pattern
* of the input matrix to be symmetric. When the matrix is structurally unsymmetric,
* Eigen computes internally the pattern of \f$A^T*A\f$ before calling the method.
* If your matrix is already symmetric (at least in structure), you can avoid that
* by calling the method with a SelfAdjointView type.
*
* \code
* // Call the ordering on the pattern of the lower triangular matrix A
* ordering(A.selfadjointView<Lower>(), perm);
* \endcode
*/
#ifndef EIGEN_MPL2_ONLY
// IWYU pragma: begin_exports
#include "src/OrderingMethods/Amd.h"
#endif
#include "src/OrderingMethods/Ordering.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_ORDERINGMETHODS_MODULE_H
#endif // EIGEN_ORDERINGMETHODS_MODULE_H

View File

@@ -22,27 +22,30 @@ extern "C" {
#endif
/** \ingroup Support_modules
* \defgroup PaStiXSupport_Module PaStiXSupport module
*
* This module provides an interface to the <a href="http://pastix.gforge.inria.fr/">PaSTiX</a> library.
* PaSTiX is a general \b supernodal, \b parallel and \b opensource sparse solver.
* It provides the two following main factorization classes:
* - class PastixLLT : a supernodal, parallel LLt Cholesky factorization.
* - class PastixLDLT: a supernodal, parallel LDLt Cholesky factorization.
* - class PastixLU : a supernodal, parallel LU factorization (optimized for a symmetric pattern).
*
* \code
* #include <Eigen/PaStiXSupport>
* \endcode
*
* In order to use this module, the PaSTiX headers must be accessible from the include paths, and your binary must be linked to the PaSTiX library and its dependencies.
* The dependencies depend on how PaSTiX has been compiled.
* For a cmake based project, you can use our FindPaSTiX.cmake module to help you in this task.
*
*/
* \defgroup PaStiXSupport_Module PaStiXSupport module
*
* This module provides an interface to the <a href="http://pastix.gforge.inria.fr/">PaSTiX</a> library.
* PaSTiX is a general \b supernodal, \b parallel and \b opensource sparse solver.
* It provides the two following main factorization classes:
* - class PastixLLT : a supernodal, parallel LLt Cholesky factorization.
* - class PastixLDLT: a supernodal, parallel LDLt Cholesky factorization.
* - class PastixLU : a supernodal, parallel LU factorization (optimized for a symmetric pattern).
*
* \code
* #include <Eigen/PaStiXSupport>
* \endcode
*
* In order to use this module, the PaSTiX headers must be accessible from the include paths, and your binary must be
* linked to the PaSTiX library and its dependencies. This wrapper requires PaStiX version 5.x compiled without MPI
* support. The dependencies depend on how PaSTiX has been compiled. For a cmake based project, you can use our
* FindPaSTiX.cmake module to help you in this task.
*
*/
// IWYU pragma: begin_exports
#include "src/PaStiXSupport/PaStiXSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_PASTIXSUPPORT_MODULE_H
#endif // EIGEN_PASTIXSUPPORT_MODULE_H

29
Eigen/PardisoSupport Executable file → Normal file
View File

@@ -15,21 +15,24 @@
#include <mkl_pardiso.h>
/** \ingroup Support_modules
* \defgroup PardisoSupport_Module PardisoSupport module
*
* This module brings support for the Intel(R) MKL PARDISO direct sparse solvers.
*
* \code
* #include <Eigen/PardisoSupport>
* \endcode
*
* In order to use this module, the MKL headers must be accessible from the include paths, and your binary must be linked to the MKL library and its dependencies.
* See this \ref TopicUsingIntelMKL "page" for more information on MKL-Eigen integration.
*
*/
* \defgroup PardisoSupport_Module PardisoSupport module
*
* This module brings support for the Intel(R) MKL PARDISO direct sparse solvers.
*
* \code
* #include <Eigen/PardisoSupport>
* \endcode
*
* In order to use this module, the MKL headers must be accessible from the include paths, and your binary must be
* linked to the MKL library and its dependencies. See this \ref TopicUsingIntelMKL "page" for more information on
* MKL-Eigen integration.
*
*/
// IWYU pragma: begin_exports
#include "src/PardisoSupport/PardisoSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_PARDISOSUPPORT_MODULE_H
#endif // EIGEN_PARDISOSUPPORT_MODULE_H

View File

@@ -10,38 +10,38 @@
#include "Core"
#include "src/Core/util/DisableStupidWarnings.h"
#include "Cholesky"
#include "Jacobi"
#include "Householder"
/** \defgroup QR_Module QR module
*
*
*
* This module provides various QR decompositions
* This module also provides some MatrixBase methods, including:
* - MatrixBase::householderQr()
* - MatrixBase::colPivHouseholderQr()
* - MatrixBase::fullPivHouseholderQr()
*
* \code
* #include <Eigen/QR>
* \endcode
*/
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup QR_Module QR module
*
* This module provides various QR decompositions
* This module also provides some MatrixBase methods, including:
* - MatrixBase::householderQr()
* - MatrixBase::colPivHouseholderQr()
* - MatrixBase::fullPivHouseholderQr()
*
* \code
* #include <Eigen/QR>
* \endcode
*/
#include "src/misc/RankRevealingBase.h"
// IWYU pragma: begin_exports
#include "src/QR/HouseholderQR.h"
#include "src/QR/FullPivHouseholderQR.h"
#include "src/QR/ColPivHouseholderQR.h"
#include "src/QR/CompleteOrthogonalDecomposition.h"
#ifdef EIGEN_USE_LAPACKE
#include "src/misc/lapacke.h"
#include "src/misc/lapacke_helpers.h"
#include "src/QR/HouseholderQR_LAPACKE.h"
#include "src/QR/ColPivHouseholderQR_LAPACKE.h"
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_QR_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_QR_MODULE_H

View File

@@ -14,20 +14,13 @@
#include "src/Core/util/DisableStupidWarnings.h"
void *qMalloc(size_t size)
{
return Eigen::internal::aligned_malloc(size);
}
inline void *qMalloc(std::size_t size) { return Eigen::internal::aligned_malloc(size); }
void qFree(void *ptr)
{
Eigen::internal::aligned_free(ptr);
}
inline void qFree(void *ptr) { Eigen::internal::aligned_free(ptr); }
void *qRealloc(void *ptr, size_t size)
{
void* newPtr = Eigen::internal::aligned_malloc(size);
memcpy(newPtr, ptr, size);
inline void *qRealloc(void *ptr, std::size_t size) {
void *newPtr = Eigen::internal::aligned_malloc(size);
std::memcpy(newPtr, ptr, size);
Eigen::internal::aligned_free(ptr);
return newPtr;
}
@@ -36,5 +29,4 @@ void *qRealloc(void *ptr, size_t size)
#endif
#endif // EIGEN_QTMALLOC_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_QTMALLOC_MODULE_H

View File

@@ -15,20 +15,27 @@
#include "SuiteSparseQR.hpp"
/** \ingroup Support_modules
* \defgroup SPQRSupport_Module SuiteSparseQR module
*
* This module provides an interface to the SPQR library, which is part of the <a href="http://www.suitesparse.com">suitesparse</a> package.
*
* \code
* #include <Eigen/SPQRSupport>
* \endcode
*
* In order to use this module, the SPQR headers must be accessible from the include paths, and your binary must be linked to the SPQR library and its dependencies (Cholmod, AMD, COLAMD,...).
* For a cmake based project, you can use our FindSPQR.cmake and FindCholmod.Cmake modules
*
*/
* \defgroup SPQRSupport_Module SuiteSparseQR module
*
* This module provides an interface to the SPQR library, which is part of the <a
* href="http://www.suitesparse.com">suitesparse</a> package.
*
* \code
* #include <Eigen/SPQRSupport>
* \endcode
*
* In order to use this module, the SPQR headers must be accessible from the include paths, and your binary must be
* linked to the SPQR library and its dependencies (Cholmod, AMD, COLAMD,...). For a cmake based project, you can use
* our FindSPQR.cmake and FindCholmod.Cmake modules
*
*/
#include "src/CholmodSupport/CholmodSupport.h"
#include "CholmodSupport"
// IWYU pragma: begin_exports
#include "src/SPQRSupport/SuiteSparseQRSupport.h"
// IWYU pragma: end_exports
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPQRSUPPORT_MODULE_H

View File

@@ -9,39 +9,45 @@
#define EIGEN_SVD_MODULE_H
#include "QR"
#include "Householder"
#include "Jacobi"
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup SVD_Module SVD module
*
*
*
* This module provides SVD decomposition for matrices (both real and complex).
* Two decomposition algorithms are provided:
* - JacobiSVD implementing two-sided Jacobi iterations is numerically very accurate, fast for small matrices, but very slow for larger ones.
* - BDCSVD implementing a recursive divide & conquer strategy on top of an upper-bidiagonalization which remains fast for large problems.
* These decompositions are accessible via the respective classes and following MatrixBase methods:
* - MatrixBase::jacobiSvd()
* - MatrixBase::bdcSvd()
*
* \code
* #include <Eigen/SVD>
* \endcode
*/
*
* This module provides SVD decomposition for matrices (both real and complex).
* Two decomposition algorithms are provided:
* - JacobiSVD implementing two-sided Jacobi iterations is numerically very accurate, fast for small matrices, but very
* slow for larger ones.
* - BDCSVD implementing a recursive divide & conquer strategy on top of an upper-bidiagonalization which remains fast
* for large problems. These decompositions are accessible via the respective classes and following MatrixBase methods:
* - MatrixBase::jacobiSvd()
* - MatrixBase::bdcSvd()
*
* \code
* #include <Eigen/SVD>
* \endcode
*/
#include "src/misc/RealSvd2x2.h"
// IWYU pragma: begin_exports
#include "src/SVD/UpperBidiagonalization.h"
#include "src/SVD/SVDBase.h"
#include "src/SVD/JacobiSVD.h"
#include "src/SVD/BDCSVD.h"
#if defined(EIGEN_USE_LAPACKE) && !defined(EIGEN_USE_LAPACKE_STRICT)
#ifdef EIGEN_USE_LAPACKE
#ifdef EIGEN_USE_MKL
#include "mkl_lapacke.h"
#elif defined(EIGEN_LAPACKE_SYSTEM)
#include <lapacke.h>
#else
#include "src/misc/lapacke.h"
#endif
#ifndef EIGEN_USE_LAPACKE_STRICT
#include "src/SVD/JacobiSVD_LAPACKE.h"
#endif
#include "src/SVD/BDCSVD_LAPACKE.h"
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SVD_MODULE_H
/* vim: set filetype=cpp et sw=2 ts=2 ai: */
#endif // EIGEN_SVD_MODULE_H

View File

@@ -30,5 +30,4 @@
#include "SparseQR"
#include "IterativeLinearSolvers"
#endif // EIGEN_SPARSE_MODULE_H
#endif // EIGEN_SPARSE_MODULE_H

View File

@@ -15,31 +15,26 @@
#include "src/Core/util/DisableStupidWarnings.h"
/**
* \defgroup SparseCholesky_Module SparseCholesky module
*
* This module currently provides two variants of the direct sparse Cholesky decomposition for selfadjoint (hermitian) matrices.
* Those decompositions are accessible via the following classes:
* - SimplicialLLt,
* - SimplicialLDLt
*
* Such problems can also be solved using the ConjugateGradient solver from the IterativeLinearSolvers module.
*
* \code
* #include <Eigen/SparseCholesky>
* \endcode
*/
#ifdef EIGEN_MPL2_ONLY
#error The SparseCholesky module has nothing to offer in MPL2 only mode
#endif
/**
* \defgroup SparseCholesky_Module SparseCholesky module
*
* This module currently provides two variants of the direct sparse Cholesky decomposition for selfadjoint (hermitian)
* matrices. Those decompositions are accessible via the following classes:
* - SimplicialLLt,
* - SimplicialLDLt
*
* Such problems can also be solved using the ConjugateGradient solver from the IterativeLinearSolvers module.
*
* \code
* #include <Eigen/SparseCholesky>
* \endcode
*/
// IWYU pragma: begin_exports
#include "src/SparseCholesky/SimplicialCholesky.h"
#ifndef EIGEN_MPL2_ONLY
#include "src/SparseCholesky/SimplicialCholesky_impl.h"
#endif
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPARSECHOLESKY_MODULE_H
#endif // EIGEN_SPARSECHOLESKY_MODULE_H

View File

@@ -12,27 +12,25 @@
#include "src/Core/util/DisableStupidWarnings.h"
#include <vector>
#include <map>
#include <cstdlib>
#include <cstring>
#include <algorithm>
#include <numeric>
/**
* \defgroup SparseCore_Module SparseCore module
*
* This module provides a sparse matrix representation, and basic associated matrix manipulations
* and operations.
*
* See the \ref TutorialSparse "Sparse tutorial"
*
* \code
* #include <Eigen/SparseCore>
* \endcode
*
* This module depends on: Core.
*/
/**
* \defgroup SparseCore_Module SparseCore module
*
* This module provides a sparse matrix representation, and basic associated matrix manipulations
* and operations.
*
* See the \ref TutorialSparse "Sparse tutorial"
*
* \code
* #include <Eigen/SparseCore>
* \endcode
*
* This module depends on: Core.
*/
// IWYU pragma: begin_exports
#include "src/SparseCore/SparseUtil.h"
#include "src/SparseCore/SparseMatrixBase.h"
#include "src/SparseCore/SparseAssign.h"
@@ -41,7 +39,6 @@
#include "src/SparseCore/SparseCompressedBase.h"
#include "src/SparseCore/SparseMatrix.h"
#include "src/SparseCore/SparseMap.h"
#include "src/SparseCore/MappedSparseMatrix.h"
#include "src/SparseCore/SparseVector.h"
#include "src/SparseCore/SparseRef.h"
#include "src/SparseCore/SparseCwiseUnaryOp.h"
@@ -62,8 +59,8 @@
#include "src/SparseCore/SparsePermutation.h"
#include "src/SparseCore/SparseFuzzy.h"
#include "src/SparseCore/SparseSolverBase.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPARSECORE_MODULE_H
#endif // EIGEN_SPARSECORE_MODULE_H

View File

@@ -13,18 +13,19 @@
#include "SparseCore"
/**
* \defgroup SparseLU_Module SparseLU module
* This module defines a supernodal factorization of general sparse matrices.
* The code is fully optimized for supernode-panel updates with specialized kernels.
* Please, see the documentation of the SparseLU class for more details.
*/
/**
* \defgroup SparseLU_Module SparseLU module
* This module defines a supernodal factorization of general sparse matrices.
* The code is fully optimized for supernode-panel updates with specialized kernels.
* Please, see the documentation of the SparseLU class for more details.
*/
// Ordering interface
#include "OrderingMethods"
#include "src/SparseLU/SparseLU_gemm_kernel.h"
#include "src/Core/util/DisableStupidWarnings.h"
// IWYU pragma: begin_exports
#include "src/SparseLU/SparseLU_Structs.h"
#include "src/SparseLU/SparseLU_SupernodalMatrix.h"
#include "src/SparseLU/SparseLUImpl.h"
@@ -42,5 +43,8 @@
#include "src/SparseLU/SparseLU_pruneL.h"
#include "src/SparseLU/SparseLU_Utils.h"
#include "src/SparseLU/SparseLU.h"
// IWYU pragma: end_exports
#endif // EIGEN_SPARSELU_MODULE_H
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SPARSELU_MODULE_H

View File

@@ -13,25 +13,26 @@
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup SparseQR_Module SparseQR module
* \brief Provides QR decomposition for sparse matrices
*
* This module provides a simplicial version of the left-looking Sparse QR decomposition.
* The columns of the input matrix should be reordered to limit the fill-in during the
* decomposition. Built-in methods (COLAMD, AMD) or external methods (METIS) can be used to this end.
* See the \link OrderingMethods_Module OrderingMethods\endlink module for the list
* of built-in and external ordering methods.
*
* \code
* #include <Eigen/SparseQR>
* \endcode
*
*
*/
* \brief Provides QR decomposition for sparse matrices
*
* This module provides a simplicial version of the left-looking Sparse QR decomposition.
* The columns of the input matrix should be reordered to limit the fill-in during the
* decomposition. Built-in methods (COLAMD, AMD) or external methods (METIS) can be used to this end.
* See the \link OrderingMethods_Module OrderingMethods\endlink module for the list
* of built-in and external ordering methods.
*
* \code
* #include <Eigen/SparseQR>
* \endcode
*
*
*/
#include "OrderingMethods"
// IWYU pragma: begin_exports
#include "src/SparseCore/SparseColEtree.h"
#include "src/SparseQR/SparseQR.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif
#endif // EIGEN_SPARSEQR_MODULE_H

View File

@@ -14,14 +14,17 @@
#include "Core"
#include <deque>
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 && \
(EIGEN_MAX_STATIC_ALIGN_BYTES <= 16) /* MSVC auto aligns up to 16 bytes in 64 bit builds */
#define EIGEN_DEFINE_STL_DEQUE_SPECIALIZATION(...)
#else
// IWYU pragma: begin_exports
#include "src/StlSupport/StdDeque.h"
// IWYU pragma: end_exports
#endif
#endif // EIGEN_STDDEQUE_MODULE_H
#endif // EIGEN_STDDEQUE_MODULE_H

View File

@@ -13,14 +13,17 @@
#include "Core"
#include <list>
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 && \
(EIGEN_MAX_STATIC_ALIGN_BYTES <= 16) /* MSVC auto aligns up to 16 bytes in 64 bit builds */
#define EIGEN_DEFINE_STL_LIST_SPECIALIZATION(...)
#else
// IWYU pragma: begin_exports
#include "src/StlSupport/StdList.h"
// IWYU pragma: end_exports
#endif
#endif // EIGEN_STDLIST_MODULE_H
#endif // EIGEN_STDLIST_MODULE_H

View File

@@ -14,14 +14,17 @@
#include "Core"
#include <vector>
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */
#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 && \
(EIGEN_MAX_STATIC_ALIGN_BYTES <= 16) /* MSVC auto aligns up to 16 bytes in 64 bit builds */
#define EIGEN_DEFINE_STL_VECTOR_SPECIALIZATION(...)
#else
// IWYU pragma: begin_exports
#include "src/StlSupport/StdVector.h"
// IWYU pragma: end_exports
#endif
#endif // EIGEN_STDVECTOR_MODULE_H
#endif // EIGEN_STDVECTOR_MODULE_H

View File

@@ -16,6 +16,7 @@
#define EIGEN_EMPTY_WAS_ALREADY_DEFINED
#endif
// Required by SuperLU headers, which expect int_t to be defined as a global typedef.
typedef int int_t;
#include <slu_Cnames.h>
#include <supermatrix.h>
@@ -26,39 +27,45 @@ typedef int int_t;
// If EMPTY was already defined then we don't undef it.
#if defined(EIGEN_EMPTY_WAS_ALREADY_DEFINED)
# undef EIGEN_EMPTY_WAS_ALREADY_DEFINED
#undef EIGEN_EMPTY_WAS_ALREADY_DEFINED
#elif defined(EMPTY)
# undef EMPTY
#undef EMPTY
#endif
#define SUPERLU_EMPTY (-1)
namespace Eigen { struct SluMatrix; }
namespace Eigen {
struct SluMatrix;
}
/** \ingroup Support_modules
* \defgroup SuperLUSupport_Module SuperLUSupport module
*
* This module provides an interface to the <a href="http://crd-legacy.lbl.gov/~xiaoye/SuperLU/">SuperLU</a> library.
* It provides the following factorization class:
* - class SuperLU: a supernodal sequential LU factorization.
* - class SuperILU: a supernodal sequential incomplete LU factorization (to be used as a preconditioner for iterative methods).
*
* \warning This wrapper requires at least versions 4.0 of SuperLU. The 3.x versions are not supported.
*
* \warning When including this module, you have to use SUPERLU_EMPTY instead of EMPTY which is no longer defined because it is too polluting.
*
* \code
* #include <Eigen/SuperLUSupport>
* \endcode
*
* In order to use this module, the superlu headers must be accessible from the include paths, and your binary must be linked to the superlu library and its dependencies.
* The dependencies depend on how superlu has been compiled.
* For a cmake based project, you can use our FindSuperLU.cmake module to help you in this task.
*
*/
* \defgroup SuperLUSupport_Module SuperLUSupport module
*
* This module provides an interface to the <a href="http://crd-legacy.lbl.gov/~xiaoye/SuperLU/">SuperLU</a> library.
* It provides the following factorization class:
* - class SuperLU: a supernodal sequential LU factorization.
* - class SuperILU: a supernodal sequential incomplete LU factorization (to be used as a preconditioner for iterative
* methods).
*
* \warning This wrapper requires at least versions 4.0 of SuperLU. The 3.x versions are not supported.
*
* \warning When including this module, you have to use SUPERLU_EMPTY instead of EMPTY which is no longer defined
* because it is too polluting.
*
* \code
* #include <Eigen/SuperLUSupport>
* \endcode
*
* In order to use this module, the superlu headers must be accessible from the include paths, and your binary must be
* linked to the superlu library and its dependencies. The dependencies depend on how superlu has been compiled. For a
* cmake based project, you can use our FindSuperLU.cmake module to help you in this task.
*
*/
// IWYU pragma: begin_exports
#include "src/SuperLUSupport/SuperLUSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_SUPERLUSUPPORT_MODULE_H
#endif // EIGEN_SUPERLUSUPPORT_MODULE_H

80
Eigen/ThreadPool Normal file
View File

@@ -0,0 +1,80 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2016 Benoit Steiner <benoit.steiner.goog@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_THREADPOOL_MODULE_H
#define EIGEN_THREADPOOL_MODULE_H
#include "Core"
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup ThreadPool_Module ThreadPool Module
*
* This module provides 2 threadpool implementations
* - a simple reference implementation
* - a faster non blocking implementation
*
* \code
* #include <Eigen/ThreadPool>
* \endcode
*/
#include <cstddef>
#include <cstring>
#include <ctime>
#include <vector>
#include <atomic>
#include <condition_variable>
#include <deque>
#include <mutex>
#include <thread>
#include <functional>
#include <memory>
#include <utility>
// There are non-parenthesized calls to "max" in the <unordered_map> header,
// which trigger a check in test/main.h causing compilation to fail.
// We work around the check here by removing the check for max in
// the case where we have to emulate thread_local.
#ifdef max
#undef max
#endif
#include <unordered_map>
#include "src/Core/util/Meta.h"
#include "src/Core/util/MaxSizeVector.h"
#ifndef EIGEN_MUTEX
#define EIGEN_MUTEX std::mutex
#endif
#ifndef EIGEN_MUTEX_LOCK
#define EIGEN_MUTEX_LOCK std::unique_lock<std::mutex>
#endif
#ifndef EIGEN_CONDVAR
#define EIGEN_CONDVAR std::condition_variable
#endif
// IWYU pragma: begin_exports
#include "src/ThreadPool/ThreadLocal.h"
#include "src/ThreadPool/ThreadYield.h"
#include "src/ThreadPool/ThreadCancel.h"
#include "src/ThreadPool/EventCount.h"
#include "src/ThreadPool/RunQueue.h"
#include "src/ThreadPool/ThreadPoolInterface.h"
#include "src/ThreadPool/ThreadEnvironment.h"
#include "src/ThreadPool/Barrier.h"
#include "src/ThreadPool/NonBlockingThreadPool.h"
#include "src/ThreadPool/CoreThreadPoolDevice.h"
#include "src/ThreadPool/ForkJoin.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_THREADPOOL_MODULE_H

View File

@@ -17,24 +17,26 @@ extern "C" {
}
/** \ingroup Support_modules
* \defgroup UmfPackSupport_Module UmfPackSupport module
*
* This module provides an interface to the UmfPack library which is part of the <a href="http://www.suitesparse.com">suitesparse</a> package.
* It provides the following factorization class:
* - class UmfPackLU: a multifrontal sequential LU factorization.
*
* \code
* #include <Eigen/UmfPackSupport>
* \endcode
*
* In order to use this module, the umfpack headers must be accessible from the include paths, and your binary must be linked to the umfpack library and its dependencies.
* The dependencies depend on how umfpack has been compiled.
* For a cmake based project, you can use our FindUmfPack.cmake module to help you in this task.
*
*/
* \defgroup UmfPackSupport_Module UmfPackSupport module
*
* This module provides an interface to the UmfPack library which is part of the <a
* href="http://www.suitesparse.com">suitesparse</a> package. It provides the following factorization class:
* - class UmfPackLU: a multifrontal sequential LU factorization.
*
* \code
* #include <Eigen/UmfPackSupport>
* \endcode
*
* In order to use this module, the umfpack headers must be accessible from the include paths, and your binary must be
* linked to the umfpack library and its dependencies. The dependencies depend on how umfpack has been compiled. For a
* cmake based project, you can use our FindUmfPack.cmake module to help you in this task.
*
*/
// IWYU pragma: begin_exports
#include "src/UmfPackSupport/UmfPackSupport.h"
// IWYU pragma: end_exports
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_UMFPACKSUPPORT_MODULE_H
#endif // EIGEN_UMFPACKSUPPORT_MODULE_H

21
Eigen/Version Normal file
View File

@@ -0,0 +1,21 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_VERSION_H
#define EIGEN_VERSION_H
// The "WORLD" version will forever remain "3" for the "Eigen3" library.
#define EIGEN_WORLD_VERSION 3
// As of Eigen3 5.0.0, we have moved to Semantic Versioning (semver.org).
#define EIGEN_MAJOR_VERSION 5
#define EIGEN_MINOR_VERSION 0
#define EIGEN_PATCH_VERSION 1
#define EIGEN_PRERELEASE_VERSION "dev"
#define EIGEN_BUILD_VERSION "master"
#define EIGEN_VERSION_STRING "5.0.1-dev+master"
#endif // EIGEN_VERSION_H

View File

@@ -0,0 +1,423 @@
#ifndef EIGEN_ACCELERATESUPPORT_H
#define EIGEN_ACCELERATESUPPORT_H
#include <Accelerate/Accelerate.h>
#include <Eigen/Sparse>
namespace Eigen {
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
class AccelerateImpl;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateLLT
* \brief A direct Cholesky (LLT) factorization and solver based on Accelerate
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
* \tparam UpLo_ additional information about the matrix structure. Default is Lower.
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateLLT
*/
template <typename MatrixType, int UpLo = Lower>
using AccelerateLLT = AccelerateImpl<MatrixType, UpLo | Symmetric, SparseFactorizationCholesky, true>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateLDLT
* \brief The default Cholesky (LDLT) factorization and solver based on Accelerate
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
* \tparam UpLo_ additional information about the matrix structure. Default is Lower.
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateLDLT
*/
template <typename MatrixType, int UpLo = Lower>
using AccelerateLDLT = AccelerateImpl<MatrixType, UpLo | Symmetric, SparseFactorizationLDLT, true>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateLDLTUnpivoted
* \brief A direct Cholesky-like LDL^T factorization and solver based on Accelerate with only 1x1 pivots and no pivoting
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
* \tparam UpLo_ additional information about the matrix structure. Default is Lower.
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateLDLTUnpivoted
*/
template <typename MatrixType, int UpLo = Lower>
using AccelerateLDLTUnpivoted = AccelerateImpl<MatrixType, UpLo | Symmetric, SparseFactorizationLDLTUnpivoted, true>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateLDLTSBK
* \brief A direct Cholesky (LDLT) factorization and solver based on Accelerate with Supernode Bunch-Kaufman and static
* pivoting
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
* \tparam UpLo_ additional information about the matrix structure. Default is Lower.
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateLDLTSBK
*/
template <typename MatrixType, int UpLo = Lower>
using AccelerateLDLTSBK = AccelerateImpl<MatrixType, UpLo | Symmetric, SparseFactorizationLDLTSBK, true>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateLDLTTPP
* \brief A direct Cholesky (LDLT) factorization and solver based on Accelerate with full threshold partial pivoting
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
* \tparam UpLo_ additional information about the matrix structure. Default is Lower.
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateLDLTTPP
*/
template <typename MatrixType, int UpLo = Lower>
using AccelerateLDLTTPP = AccelerateImpl<MatrixType, UpLo | Symmetric, SparseFactorizationLDLTTPP, true>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateQR
* \brief A QR factorization and solver based on Accelerate
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateQR
*/
template <typename MatrixType>
using AccelerateQR = AccelerateImpl<MatrixType, 0, SparseFactorizationQR, false>;
/** \ingroup AccelerateSupport_Module
* \typedef AccelerateCholeskyAtA
* \brief A QR factorization and solver based on Accelerate without storing Q (equivalent to A^TA = R^T R)
*
* \warning Only single and double precision real scalar types are supported by Accelerate
*
* \tparam MatrixType_ the type of the sparse matrix A, it must be a SparseMatrix<>
*
* \sa \ref TutorialSparseSolverConcept, class AccelerateCholeskyAtA
*/
template <typename MatrixType>
using AccelerateCholeskyAtA = AccelerateImpl<MatrixType, 0, SparseFactorizationCholeskyAtA, false>;
namespace internal {
template <typename T>
struct AccelFactorizationDeleter {
void operator()(T* sym) const {
if (sym) {
SparseCleanup(*sym);
delete sym;
sym = nullptr;
}
}
};
template <typename DenseVecT, typename DenseMatT, typename SparseMatT, typename NumFactT>
struct SparseTypesTraitBase {
typedef DenseVecT AccelDenseVector;
typedef DenseMatT AccelDenseMatrix;
typedef SparseMatT AccelSparseMatrix;
typedef SparseOpaqueSymbolicFactorization SymbolicFactorization;
typedef NumFactT NumericFactorization;
typedef AccelFactorizationDeleter<SymbolicFactorization> SymbolicFactorizationDeleter;
typedef AccelFactorizationDeleter<NumericFactorization> NumericFactorizationDeleter;
};
template <typename Scalar>
struct SparseTypesTrait {};
template <>
struct SparseTypesTrait<double> : SparseTypesTraitBase<DenseVector_Double, DenseMatrix_Double, SparseMatrix_Double,
SparseOpaqueFactorization_Double> {};
template <>
struct SparseTypesTrait<float>
: SparseTypesTraitBase<DenseVector_Float, DenseMatrix_Float, SparseMatrix_Float, SparseOpaqueFactorization_Float> {
};
} // end namespace internal
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
class AccelerateImpl : public SparseSolverBase<AccelerateImpl<MatrixType_, UpLo_, Solver_, EnforceSquare_> > {
protected:
using Base = SparseSolverBase<AccelerateImpl>;
using Base::derived;
using Base::m_isInitialized;
public:
using Base::_solve_impl;
typedef MatrixType_ MatrixType;
typedef typename MatrixType::Scalar Scalar;
typedef typename MatrixType::StorageIndex StorageIndex;
enum { ColsAtCompileTime = Dynamic, MaxColsAtCompileTime = Dynamic };
enum { UpLo = UpLo_ };
using AccelDenseVector = typename internal::SparseTypesTrait<Scalar>::AccelDenseVector;
using AccelDenseMatrix = typename internal::SparseTypesTrait<Scalar>::AccelDenseMatrix;
using AccelSparseMatrix = typename internal::SparseTypesTrait<Scalar>::AccelSparseMatrix;
using SymbolicFactorization = typename internal::SparseTypesTrait<Scalar>::SymbolicFactorization;
using NumericFactorization = typename internal::SparseTypesTrait<Scalar>::NumericFactorization;
using SymbolicFactorizationDeleter = typename internal::SparseTypesTrait<Scalar>::SymbolicFactorizationDeleter;
using NumericFactorizationDeleter = typename internal::SparseTypesTrait<Scalar>::NumericFactorizationDeleter;
AccelerateImpl() {
m_isInitialized = false;
auto check_flag_set = [](int value, int flag) { return ((value & flag) == flag); };
if (check_flag_set(UpLo_, Symmetric)) {
m_sparseKind = SparseSymmetric;
m_triType = (UpLo_ & Lower) ? SparseLowerTriangle : SparseUpperTriangle;
} else if (check_flag_set(UpLo_, UnitLower)) {
m_sparseKind = SparseUnitTriangular;
m_triType = SparseLowerTriangle;
} else if (check_flag_set(UpLo_, UnitUpper)) {
m_sparseKind = SparseUnitTriangular;
m_triType = SparseUpperTriangle;
} else if (check_flag_set(UpLo_, StrictlyLower)) {
m_sparseKind = SparseTriangular;
m_triType = SparseLowerTriangle;
} else if (check_flag_set(UpLo_, StrictlyUpper)) {
m_sparseKind = SparseTriangular;
m_triType = SparseUpperTriangle;
} else if (check_flag_set(UpLo_, Lower)) {
m_sparseKind = SparseTriangular;
m_triType = SparseLowerTriangle;
} else if (check_flag_set(UpLo_, Upper)) {
m_sparseKind = SparseTriangular;
m_triType = SparseUpperTriangle;
} else {
m_sparseKind = SparseOrdinary;
m_triType = (UpLo_ & Lower) ? SparseLowerTriangle : SparseUpperTriangle;
}
m_order = SparseOrderDefault;
}
explicit AccelerateImpl(const MatrixType& matrix) : AccelerateImpl() { compute(matrix); }
~AccelerateImpl() {}
inline Index cols() const { return m_nCols; }
inline Index rows() const { return m_nRows; }
ComputationInfo info() const {
eigen_assert(m_isInitialized && "Decomposition is not initialized.");
return m_info;
}
void analyzePattern(const MatrixType& matrix);
void factorize(const MatrixType& matrix);
void compute(const MatrixType& matrix);
template <typename Rhs, typename Dest>
void _solve_impl(const MatrixBase<Rhs>& b, MatrixBase<Dest>& dest) const;
/** Sets the ordering algorithm to use. */
void setOrder(SparseOrder_t order) { m_order = order; }
private:
template <typename T>
void buildAccelSparseMatrix(const SparseMatrix<T>& a, AccelSparseMatrix& A, std::vector<long>& columnStarts) {
const Index nColumnsStarts = a.cols() + 1;
columnStarts.resize(nColumnsStarts);
for (Index i = 0; i < nColumnsStarts; i++) columnStarts[i] = a.outerIndexPtr()[i];
SparseAttributes_t attributes{};
attributes.transpose = false;
attributes.triangle = m_triType;
attributes.kind = m_sparseKind;
SparseMatrixStructure structure{};
structure.attributes = attributes;
structure.rowCount = static_cast<int>(a.rows());
structure.columnCount = static_cast<int>(a.cols());
structure.blockSize = 1;
structure.columnStarts = columnStarts.data();
structure.rowIndices = const_cast<int*>(a.innerIndexPtr());
A.structure = structure;
A.data = const_cast<T*>(a.valuePtr());
}
void doAnalysis(AccelSparseMatrix& A) {
m_numericFactorization.reset(nullptr);
SparseSymbolicFactorOptions opts{};
opts.control = SparseDefaultControl;
opts.orderMethod = m_order;
opts.order = nullptr;
opts.ignoreRowsAndColumns = nullptr;
opts.malloc = malloc;
opts.free = free;
opts.reportError = nullptr;
m_symbolicFactorization.reset(new SymbolicFactorization(SparseFactor(Solver_, A.structure, opts)));
SparseStatus_t status = m_symbolicFactorization->status;
updateInfoStatus(status);
if (status != SparseStatusOK) m_symbolicFactorization.reset(nullptr);
}
void doFactorization(AccelSparseMatrix& A) {
SparseStatus_t status = SparseStatusReleased;
if (m_symbolicFactorization) {
m_numericFactorization.reset(new NumericFactorization(SparseFactor(*m_symbolicFactorization, A)));
status = m_numericFactorization->status;
if (status != SparseStatusOK) m_numericFactorization.reset(nullptr);
}
updateInfoStatus(status);
}
protected:
void updateInfoStatus(SparseStatus_t status) const {
switch (status) {
case SparseStatusOK:
m_info = Success;
break;
case SparseFactorizationFailed:
case SparseMatrixIsSingular:
m_info = NumericalIssue;
break;
case SparseInternalError:
case SparseParameterError:
case SparseStatusReleased:
default:
m_info = InvalidInput;
break;
}
}
mutable ComputationInfo m_info;
Index m_nRows, m_nCols;
std::unique_ptr<SymbolicFactorization, SymbolicFactorizationDeleter> m_symbolicFactorization;
std::unique_ptr<NumericFactorization, NumericFactorizationDeleter> m_numericFactorization;
SparseKind_t m_sparseKind;
SparseTriangle_t m_triType;
SparseOrder_t m_order;
};
/** Computes the symbolic and numeric decomposition of matrix \a a */
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
void AccelerateImpl<MatrixType_, UpLo_, Solver_, EnforceSquare_>::compute(const MatrixType& a) {
if (EnforceSquare_) eigen_assert(a.rows() == a.cols());
m_nRows = a.rows();
m_nCols = a.cols();
AccelSparseMatrix A{};
std::vector<long> columnStarts;
buildAccelSparseMatrix(a, A, columnStarts);
doAnalysis(A);
if (m_symbolicFactorization) doFactorization(A);
m_isInitialized = true;
}
/** Performs a symbolic decomposition on the sparsity pattern of matrix \a a.
*
* This function is particularly useful when solving for several problems having the same structure.
*
* \sa factorize()
*/
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
void AccelerateImpl<MatrixType_, UpLo_, Solver_, EnforceSquare_>::analyzePattern(const MatrixType& a) {
if (EnforceSquare_) eigen_assert(a.rows() == a.cols());
m_nRows = a.rows();
m_nCols = a.cols();
AccelSparseMatrix A{};
std::vector<long> columnStarts;
buildAccelSparseMatrix(a, A, columnStarts);
doAnalysis(A);
m_isInitialized = true;
}
/** Performs a numeric decomposition of matrix \a a.
*
* The given matrix must have the same sparsity pattern as the matrix on which the symbolic decomposition has been
* performed.
*
* \sa analyzePattern()
*/
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
void AccelerateImpl<MatrixType_, UpLo_, Solver_, EnforceSquare_>::factorize(const MatrixType& a) {
eigen_assert(m_symbolicFactorization && "You must first call analyzePattern()");
eigen_assert(m_nRows == a.rows() && m_nCols == a.cols());
if (EnforceSquare_) eigen_assert(a.rows() == a.cols());
AccelSparseMatrix A{};
std::vector<long> columnStarts;
buildAccelSparseMatrix(a, A, columnStarts);
doFactorization(A);
}
template <typename MatrixType_, int UpLo_, SparseFactorization_t Solver_, bool EnforceSquare_>
template <typename Rhs, typename Dest>
void AccelerateImpl<MatrixType_, UpLo_, Solver_, EnforceSquare_>::_solve_impl(const MatrixBase<Rhs>& b,
MatrixBase<Dest>& x) const {
if (!m_numericFactorization) {
m_info = InvalidInput;
return;
}
eigen_assert(m_nRows == b.rows());
eigen_assert(((b.cols() == 1) || b.outerStride() == b.rows()));
SparseStatus_t status = SparseStatusOK;
Scalar* b_ptr = const_cast<Scalar*>(b.derived().data());
Scalar* x_ptr = const_cast<Scalar*>(x.derived().data());
AccelDenseMatrix xmat{};
xmat.attributes = SparseAttributes_t();
xmat.columnCount = static_cast<int>(x.cols());
xmat.rowCount = static_cast<int>(x.rows());
xmat.columnStride = xmat.rowCount;
xmat.data = x_ptr;
AccelDenseMatrix bmat{};
bmat.attributes = SparseAttributes_t();
bmat.columnCount = static_cast<int>(b.cols());
bmat.rowCount = static_cast<int>(b.rows());
bmat.columnStride = bmat.rowCount;
bmat.data = b_ptr;
SparseSolve(*m_numericFactorization, bmat, xmat);
updateInfoStatus(status);
}
} // end namespace Eigen
#endif // EIGEN_ACCELERATESUPPORT_H

View File

@@ -0,0 +1,3 @@
#ifndef EIGEN_ACCELERATESUPPORT_MODULE_H
#error "Please include Eigen/AccelerateSupport instead of including headers inside the src directory directly."
#endif

View File

@@ -0,0 +1,3 @@
#ifndef EIGEN_CHOLESKY_MODULE_H
#error "Please include Eigen/Cholesky instead of including headers inside the src directory directly."
#endif

File diff suppressed because it is too large Load Diff

View File

@@ -10,434 +10,412 @@
#ifndef EIGEN_LLT_H
#define EIGEN_LLT_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal{
template<typename MatrixType, int UpLo> struct LLT_Traits;
}
namespace internal {
template <typename MatrixType_, int UpLo_>
struct traits<LLT<MatrixType_, UpLo_> > : traits<MatrixType_> {
typedef MatrixXpr XprKind;
typedef SolverStorage StorageKind;
typedef int StorageIndex;
enum { Flags = 0 };
};
template <typename MatrixType, int UpLo>
struct LLT_Traits;
} // namespace internal
/** \ingroup Cholesky_Module
*
* \class LLT
*
* \brief Standard Cholesky decomposition (LL^T) of a matrix and associated features
*
* \tparam _MatrixType the type of the matrix of which we are computing the LL^T Cholesky decomposition
* \tparam _UpLo the triangular part that will be used for the decompositon: Lower (default) or Upper.
* The other triangular part won't be read.
*
* This class performs a LL^T Cholesky decomposition of a symmetric, positive definite
* matrix A such that A = LL^* = U^*U, where L is lower triangular.
*
* While the Cholesky decomposition is particularly useful to solve selfadjoint problems like D^*D x = b,
* for that purpose, we recommend the Cholesky decomposition without square root which is more stable
* and even faster. Nevertheless, this standard Cholesky decomposition remains useful in many other
* situations like generalised eigen problems with hermitian matrices.
*
* Remember that Cholesky decompositions are not rank-revealing. This LLT decomposition is only stable on positive definite matrices,
* use LDLT instead for the semidefinite case. Also, do not use a Cholesky decomposition to determine whether a system of equations
* has a solution.
*
* Example: \include LLT_example.cpp
* Output: \verbinclude LLT_example.out
*
* This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism.
*
* \sa MatrixBase::llt(), SelfAdjointView::llt(), class LDLT
*/
/* HEY THIS DOX IS DISABLED BECAUSE THERE's A BUG EITHER HERE OR IN LDLT ABOUT THAT (OR BOTH)
* Note that during the decomposition, only the upper triangular part of A is considered. Therefore,
* the strict lower part does not have to store correct values.
*/
template<typename _MatrixType, int _UpLo> class LLT
{
public:
typedef _MatrixType MatrixType;
enum {
RowsAtCompileTime = MatrixType::RowsAtCompileTime,
ColsAtCompileTime = MatrixType::ColsAtCompileTime,
MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
};
typedef typename MatrixType::Scalar Scalar;
typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
typedef Eigen::Index Index; ///< \deprecated since Eigen 3.3
typedef typename MatrixType::StorageIndex StorageIndex;
*
* \class LLT
*
* \brief Standard Cholesky decomposition (LL^T) of a matrix and associated features
*
* \tparam MatrixType_ the type of the matrix of which we are computing the LL^T Cholesky decomposition
* \tparam UpLo_ the triangular part that will be used for the decomposition: Lower (default) or Upper.
* The other triangular part won't be read.
*
* This class performs a LL^T Cholesky decomposition of a symmetric, positive definite
* matrix A such that A = LL^* = U^*U, where L is lower triangular.
*
* While the Cholesky decomposition is particularly useful to solve selfadjoint problems like D^*D x = b,
* for that purpose, we recommend the Cholesky decomposition without square root which is more stable
* and even faster. Nevertheless, this standard Cholesky decomposition remains useful in many other
* situations like generalised eigen problems with hermitian matrices.
*
* Remember that Cholesky decompositions are not rank-revealing. This LLT decomposition is only stable on positive
* definite matrices, use LDLT instead for the semidefinite case. Also, do not use a Cholesky decomposition to determine
* whether a system of equations has a solution.
*
* Example: \include LLT_example.cpp
* Output: \verbinclude LLT_example.out
*
* \b Performance: for best performance, it is recommended to use a column-major storage format
* with the Lower triangular part (the default), or, equivalently, a row-major storage format
* with the Upper triangular part. Otherwise, you might get a 20% slowdown for the full factorization
* step, and rank-updates can be up to 3 times slower.
*
* This class supports the \link InplaceDecomposition inplace decomposition \endlink mechanism.
*
* Note that during the decomposition, only the lower (or upper, as defined by UpLo_) triangular part of A is
* considered. Therefore, the strict lower part does not have to store correct values.
*
* \sa MatrixBase::llt(), SelfAdjointView::llt(), class LDLT
*/
template <typename MatrixType_, int UpLo_>
class LLT : public SolverBase<LLT<MatrixType_, UpLo_> > {
public:
typedef MatrixType_ MatrixType;
typedef SolverBase<LLT> Base;
friend class SolverBase<LLT>;
enum {
PacketSize = internal::packet_traits<Scalar>::size,
AlignmentMask = int(PacketSize)-1,
UpLo = _UpLo
};
EIGEN_GENERIC_PUBLIC_INTERFACE(LLT)
enum { MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime };
typedef internal::LLT_Traits<MatrixType,UpLo> Traits;
enum { PacketSize = internal::packet_traits<Scalar>::size, AlignmentMask = int(PacketSize) - 1, UpLo = UpLo_ };
/**
* \brief Default Constructor.
*
* The default constructor is useful in cases in which the user intends to
* perform decompositions via LLT::compute(const MatrixType&).
*/
LLT() : m_matrix(), m_isInitialized(false) {}
typedef internal::LLT_Traits<MatrixType, UpLo> Traits;
/** \brief Default Constructor with memory preallocation
*
* Like the default constructor but with preallocation of the internal data
* according to the specified problem \a size.
* \sa LLT()
*/
explicit LLT(Index size) : m_matrix(size, size),
m_isInitialized(false) {}
/**
* \brief Default Constructor.
*
* The default constructor is useful in cases in which the user intends to
* perform decompositions via LLT::compute(const MatrixType&).
*/
LLT() : m_matrix(), m_l1_norm(0), m_isInitialized(false), m_info(InvalidInput) {}
template<typename InputType>
explicit LLT(const EigenBase<InputType>& matrix)
: m_matrix(matrix.rows(), matrix.cols()),
m_isInitialized(false)
{
compute(matrix.derived());
}
/** \brief Default Constructor with memory preallocation
*
* Like the default constructor but with preallocation of the internal data
* according to the specified problem \a size.
* \sa LLT()
*/
explicit LLT(Index size) : m_matrix(size, size), m_l1_norm(0), m_isInitialized(false), m_info(InvalidInput) {}
/** \brief Constructs a LDLT factorization from a given matrix
*
* This overloaded constructor is provided for \link InplaceDecomposition inplace decomposition \endlink when
* \c MatrixType is a Eigen::Ref.
*
* \sa LLT(const EigenBase&)
*/
template<typename InputType>
explicit LLT(EigenBase<InputType>& matrix)
: m_matrix(matrix.derived()),
m_isInitialized(false)
{
compute(matrix.derived());
}
template <typename InputType>
explicit LLT(const EigenBase<InputType>& matrix)
: m_matrix(matrix.rows(), matrix.cols()), m_l1_norm(0), m_isInitialized(false), m_info(InvalidInput) {
compute(matrix.derived());
}
/** \returns a view of the upper triangular matrix U */
inline typename Traits::MatrixU matrixU() const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
return Traits::getU(m_matrix);
}
/** \brief Constructs a LLT factorization from a given matrix
*
* This overloaded constructor is provided for \link InplaceDecomposition inplace decomposition \endlink when
* \c MatrixType is a Eigen::Ref.
*
* \sa LLT(const EigenBase&)
*/
template <typename InputType>
explicit LLT(EigenBase<InputType>& matrix)
: m_matrix(matrix.derived()), m_l1_norm(0), m_isInitialized(false), m_info(InvalidInput) {
compute(matrix.derived());
}
/** \returns a view of the lower triangular matrix L */
inline typename Traits::MatrixL matrixL() const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
return Traits::getL(m_matrix);
}
/** \returns a view of the upper triangular matrix U */
inline typename Traits::MatrixU matrixU() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
return Traits::getU(m_matrix);
}
/** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
*
* Since this LLT class assumes anyway that the matrix A is invertible, the solution
* theoretically exists and is unique regardless of b.
*
* Example: \include LLT_solve.cpp
* Output: \verbinclude LLT_solve.out
*
* \sa solveInPlace(), MatrixBase::llt(), SelfAdjointView::llt()
*/
template<typename Rhs>
inline const Solve<LLT, Rhs>
solve(const MatrixBase<Rhs>& b) const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
eigen_assert(m_matrix.rows()==b.rows()
&& "LLT::solve(): invalid number of rows of the right hand side matrix b");
return Solve<LLT, Rhs>(*this, b.derived());
}
/** \returns a view of the lower triangular matrix L */
inline typename Traits::MatrixL matrixL() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
return Traits::getL(m_matrix);
}
template<typename Derived>
void solveInPlace(MatrixBase<Derived> &bAndX) const;
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
*
* Since this LLT class assumes anyway that the matrix A is invertible, the solution
* theoretically exists and is unique regardless of b.
*
* Example: \include LLT_solve.cpp
* Output: \verbinclude LLT_solve.out
*
* \sa solveInPlace(), MatrixBase::llt(), SelfAdjointView::llt()
*/
template <typename Rhs>
inline Solve<LLT, Rhs> solve(const MatrixBase<Rhs>& b) const;
#endif
template<typename InputType>
LLT& compute(const EigenBase<InputType>& matrix);
template <typename Derived>
void solveInPlace(const MatrixBase<Derived>& bAndX) const;
/** \returns an estimate of the reciprocal condition number of the matrix of
* which \c *this is the Cholesky decomposition.
*/
RealScalar rcond() const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
eigen_assert(m_info == Success && "LLT failed because matrix appears to be negative");
return internal::rcond_estimate_helper(m_l1_norm, *this);
}
template <typename InputType>
LLT& compute(const EigenBase<InputType>& matrix);
/** \returns the LLT decomposition matrix
*
* TODO: document the storage layout
*/
inline const MatrixType& matrixLLT() const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
return m_matrix;
}
/** \returns an estimate of the reciprocal condition number of the matrix of
* which \c *this is the Cholesky decomposition.
*/
RealScalar rcond() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
eigen_assert(m_info == Success && "LLT failed because matrix appears to be negative");
return internal::rcond_estimate_helper(m_l1_norm, *this);
}
MatrixType reconstructedMatrix() const;
/** \returns the LLT decomposition matrix
*
* TODO: document the storage layout
*/
inline const MatrixType& matrixLLT() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
return m_matrix;
}
MatrixType reconstructedMatrix() const;
/** \brief Reports whether previous computation was successful.
*
* \returns \c Success if computation was succesful,
* \c NumericalIssue if the matrix.appears to be negative.
*/
ComputationInfo info() const
{
eigen_assert(m_isInitialized && "LLT is not initialized.");
return m_info;
}
/** \brief Reports whether previous computation was successful.
*
* \returns \c Success if computation was successful,
* \c NumericalIssue if the matrix.appears not to be positive definite.
*/
ComputationInfo info() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
return m_info;
}
/** \returns the adjoint of \c *this, that is, a const reference to the decomposition itself as the underlying matrix is self-adjoint.
*
* This method is provided for compatibility with other matrix decompositions, thus enabling generic code such as:
* \code x = decomposition.adjoint().solve(b) \endcode
*/
const LLT& adjoint() const { return *this; };
/** \returns the adjoint of \c *this, that is, a const reference to the decomposition itself as the underlying matrix
* is self-adjoint.
*
* This method is provided for compatibility with other matrix decompositions, thus enabling generic code such as:
* \code x = decomposition.adjoint().solve(b) \endcode
*/
const LLT& adjoint() const noexcept { return *this; }
inline Index rows() const { return m_matrix.rows(); }
inline Index cols() const { return m_matrix.cols(); }
constexpr Index rows() const noexcept { return m_matrix.rows(); }
constexpr Index cols() const noexcept { return m_matrix.cols(); }
template<typename VectorType>
LLT rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);
template <typename VectorType>
LLT& rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename RhsType, typename DstType>
EIGEN_DEVICE_FUNC
void _solve_impl(const RhsType &rhs, DstType &dst) const;
#endif
#ifndef EIGEN_PARSED_BY_DOXYGEN
template <typename RhsType, typename DstType>
void _solve_impl(const RhsType& rhs, DstType& dst) const;
protected:
template <bool Conjugate, typename RhsType, typename DstType>
void _solve_impl_transposed(const RhsType& rhs, DstType& dst) const;
#endif
static void check_template_parameters()
{
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar);
}
protected:
EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
/** \internal
* Used to compute and store L
* The strict upper part is not used and even not initialized.
*/
MatrixType m_matrix;
RealScalar m_l1_norm;
bool m_isInitialized;
ComputationInfo m_info;
/** \internal
* Used to compute and store L
* The strict upper part is not used and even not initialized.
*/
MatrixType m_matrix;
RealScalar m_l1_norm;
bool m_isInitialized;
ComputationInfo m_info;
};
namespace internal {
template<typename Scalar, int UpLo> struct llt_inplace;
template <typename Scalar, int UpLo>
struct llt_inplace;
template<typename MatrixType, typename VectorType>
static Index llt_rank_update_lower(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma)
{
template <typename MatrixType, typename VectorType>
static Index llt_rank_update_lower(MatrixType& mat, const VectorType& vec,
const typename MatrixType::RealScalar& sigma) {
using std::sqrt;
typedef typename MatrixType::Scalar Scalar;
typedef typename MatrixType::RealScalar RealScalar;
typedef typename MatrixType::ColXpr ColXpr;
typedef typename internal::remove_all<ColXpr>::type ColXprCleaned;
typedef internal::remove_all_t<ColXpr> ColXprCleaned;
typedef typename ColXprCleaned::SegmentReturnType ColXprSegment;
typedef Matrix<Scalar,Dynamic,1> TempVectorType;
typedef Matrix<Scalar, Dynamic, 1> TempVectorType;
typedef typename TempVectorType::SegmentReturnType TempVecSegment;
Index n = mat.cols();
eigen_assert(mat.rows()==n && vec.size()==n);
eigen_assert(mat.rows() == n && vec.size() == n);
TempVectorType temp;
if(sigma>0)
{
if (sigma > 0) {
// This version is based on Givens rotations.
// It is faster than the other one below, but only works for updates,
// i.e., for sigma > 0
temp = sqrt(sigma) * vec;
for(Index i=0; i<n; ++i)
{
for (Index i = 0; i < n; ++i) {
JacobiRotation<Scalar> g;
g.makeGivens(mat(i,i), -temp(i), &mat(i,i));
g.makeGivens(mat(i, i), -temp(i), &mat(i, i));
Index rs = n-i-1;
if(rs>0)
{
Index rs = n - i - 1;
if (rs > 0) {
ColXprSegment x(mat.col(i).tail(rs));
TempVecSegment y(temp.tail(rs));
apply_rotation_in_the_plane(x, y, g);
}
}
}
else
{
} else {
temp = vec;
RealScalar beta = 1;
for(Index j=0; j<n; ++j)
{
RealScalar Ljj = numext::real(mat.coeff(j,j));
for (Index j = 0; j < n; ++j) {
RealScalar Ljj = numext::real(mat.coeff(j, j));
RealScalar dj = numext::abs2(Ljj);
Scalar wj = temp.coeff(j);
RealScalar swj2 = sigma*numext::abs2(wj);
RealScalar gamma = dj*beta + swj2;
RealScalar swj2 = sigma * numext::abs2(wj);
RealScalar gamma = dj * beta + swj2;
RealScalar x = dj + swj2/beta;
if (x<=RealScalar(0))
return j;
RealScalar x = dj + swj2 / beta;
if (x <= RealScalar(0)) return j;
RealScalar nLjj = sqrt(x);
mat.coeffRef(j,j) = nLjj;
beta += swj2/dj;
mat.coeffRef(j, j) = nLjj;
beta += swj2 / dj;
// Update the terms of L
Index rs = n-j-1;
if(rs)
{
temp.tail(rs) -= (wj/Ljj) * mat.col(j).tail(rs);
if(gamma != 0)
mat.col(j).tail(rs) = (nLjj/Ljj) * mat.col(j).tail(rs) + (nLjj * sigma*numext::conj(wj)/gamma)*temp.tail(rs);
Index rs = n - j - 1;
if (rs) {
temp.tail(rs) -= (wj / Ljj) * mat.col(j).tail(rs);
if (!numext::is_exactly_zero(gamma))
mat.col(j).tail(rs) =
(nLjj / Ljj) * mat.col(j).tail(rs) + (nLjj * sigma * numext::conj(wj) / gamma) * temp.tail(rs);
}
}
}
return -1;
}
template<typename Scalar> struct llt_inplace<Scalar, Lower>
{
template <typename Scalar>
struct llt_inplace<Scalar, Lower> {
typedef typename NumTraits<Scalar>::Real RealScalar;
template<typename MatrixType>
static Index unblocked(MatrixType& mat)
{
template <typename MatrixType>
static Index unblocked(MatrixType& mat) {
using std::sqrt;
eigen_assert(mat.rows()==mat.cols());
eigen_assert(mat.rows() == mat.cols());
const Index size = mat.rows();
for(Index k = 0; k < size; ++k)
{
Index rs = size-k-1; // remaining size
for (Index k = 0; k < size; ++k) {
Index rs = size - k - 1; // remaining size
Block<MatrixType,Dynamic,1> A21(mat,k+1,k,rs,1);
Block<MatrixType,1,Dynamic> A10(mat,k,0,1,k);
Block<MatrixType,Dynamic,Dynamic> A20(mat,k+1,0,rs,k);
Block<MatrixType, Dynamic, 1> A21(mat, k + 1, k, rs, 1);
Block<MatrixType, 1, Dynamic> A10(mat, k, 0, 1, k);
Block<MatrixType, Dynamic, Dynamic> A20(mat, k + 1, 0, rs, k);
RealScalar x = numext::real(mat.coeff(k,k));
if (k>0) x -= A10.squaredNorm();
if (x<=RealScalar(0))
return k;
mat.coeffRef(k,k) = x = sqrt(x);
if (k>0 && rs>0) A21.noalias() -= A20 * A10.adjoint();
if (rs>0) A21 /= x;
RealScalar x = numext::real(mat.coeff(k, k));
if (k > 0) x -= A10.squaredNorm();
if (x <= RealScalar(0)) return k;
mat.coeffRef(k, k) = x = sqrt(x);
if (k > 0 && rs > 0) A21.noalias() -= A20 * A10.adjoint();
if (rs > 0) A21 /= x;
}
return -1;
}
template<typename MatrixType>
static Index blocked(MatrixType& m)
{
eigen_assert(m.rows()==m.cols());
template <typename MatrixType>
static Index blocked(MatrixType& m) {
eigen_assert(m.rows() == m.cols());
Index size = m.rows();
if(size<32)
return unblocked(m);
if (size < 32) return unblocked(m);
Index blockSize = size/8;
blockSize = (blockSize/16)*16;
blockSize = (std::min)((std::max)(blockSize,Index(8)), Index(128));
Index blockSize = size / 8;
blockSize = (blockSize / 16) * 16;
blockSize = (std::min)((std::max)(blockSize, Index(8)), Index(128));
for (Index k=0; k<size; k+=blockSize)
{
for (Index k = 0; k < size; k += blockSize) {
// partition the matrix:
// A00 | - | -
// lu = A10 | A11 | -
// A20 | A21 | A22
Index bs = (std::min)(blockSize, size-k);
Index bs = (std::min)(blockSize, size - k);
Index rs = size - k - bs;
Block<MatrixType,Dynamic,Dynamic> A11(m,k, k, bs,bs);
Block<MatrixType,Dynamic,Dynamic> A21(m,k+bs,k, rs,bs);
Block<MatrixType,Dynamic,Dynamic> A22(m,k+bs,k+bs,rs,rs);
Block<MatrixType, Dynamic, Dynamic> A11(m, k, k, bs, bs);
Block<MatrixType, Dynamic, Dynamic> A21(m, k + bs, k, rs, bs);
Block<MatrixType, Dynamic, Dynamic> A22(m, k + bs, k + bs, rs, rs);
Index ret;
if((ret=unblocked(A11))>=0) return k+ret;
if(rs>0) A11.adjoint().template triangularView<Upper>().template solveInPlace<OnTheRight>(A21);
if(rs>0) A22.template selfadjointView<Lower>().rankUpdate(A21,typename NumTraits<RealScalar>::Literal(-1)); // bottleneck
if ((ret = unblocked(A11)) >= 0) return k + ret;
if (rs > 0) A11.adjoint().template triangularView<Upper>().template solveInPlace<OnTheRight>(A21);
if (rs > 0)
A22.template selfadjointView<Lower>().rankUpdate(A21,
typename NumTraits<RealScalar>::Literal(-1)); // bottleneck
}
return -1;
}
template<typename MatrixType, typename VectorType>
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma)
{
template <typename MatrixType, typename VectorType>
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma) {
return Eigen::internal::llt_rank_update_lower(mat, vec, sigma);
}
};
template<typename Scalar> struct llt_inplace<Scalar, Upper>
{
template <typename Scalar>
struct llt_inplace<Scalar, Upper> {
typedef typename NumTraits<Scalar>::Real RealScalar;
template<typename MatrixType>
static EIGEN_STRONG_INLINE Index unblocked(MatrixType& mat)
{
template <typename MatrixType>
static EIGEN_STRONG_INLINE Index unblocked(MatrixType& mat) {
Transpose<MatrixType> matt(mat);
return llt_inplace<Scalar, Lower>::unblocked(matt);
}
template<typename MatrixType>
static EIGEN_STRONG_INLINE Index blocked(MatrixType& mat)
{
template <typename MatrixType>
static EIGEN_STRONG_INLINE Index blocked(MatrixType& mat) {
Transpose<MatrixType> matt(mat);
return llt_inplace<Scalar, Lower>::blocked(matt);
}
template<typename MatrixType, typename VectorType>
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma)
{
template <typename MatrixType, typename VectorType>
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma) {
Transpose<MatrixType> matt(mat);
return llt_inplace<Scalar, Lower>::rankUpdate(matt, vec.conjugate(), sigma);
}
};
template<typename MatrixType> struct LLT_Traits<MatrixType,Lower>
{
template <typename MatrixType>
struct LLT_Traits<MatrixType, Lower> {
typedef const TriangularView<const MatrixType, Lower> MatrixL;
typedef const TriangularView<const typename MatrixType::AdjointReturnType, Upper> MatrixU;
static inline MatrixL getL(const MatrixType& m) { return MatrixL(m); }
static inline MatrixU getU(const MatrixType& m) { return MatrixU(m.adjoint()); }
static bool inplace_decomposition(MatrixType& m)
{ return llt_inplace<typename MatrixType::Scalar, Lower>::blocked(m)==-1; }
static bool inplace_decomposition(MatrixType& m) {
return llt_inplace<typename MatrixType::Scalar, Lower>::blocked(m) == -1;
}
};
template<typename MatrixType> struct LLT_Traits<MatrixType,Upper>
{
template <typename MatrixType>
struct LLT_Traits<MatrixType, Upper> {
typedef const TriangularView<const typename MatrixType::AdjointReturnType, Lower> MatrixL;
typedef const TriangularView<const MatrixType, Upper> MatrixU;
static inline MatrixL getL(const MatrixType& m) { return MatrixL(m.adjoint()); }
static inline MatrixU getU(const MatrixType& m) { return MatrixU(m); }
static bool inplace_decomposition(MatrixType& m)
{ return llt_inplace<typename MatrixType::Scalar, Upper>::blocked(m)==-1; }
static bool inplace_decomposition(MatrixType& m) {
return llt_inplace<typename MatrixType::Scalar, Upper>::blocked(m) == -1;
}
};
} // end namespace internal
} // end namespace internal
/** Computes / recomputes the Cholesky decomposition A = LL^* = U^*U of \a matrix
*
* \returns a reference to *this
*
* Example: \include TutorialLinAlgComputeTwice.cpp
* Output: \verbinclude TutorialLinAlgComputeTwice.out
*/
template<typename MatrixType, int _UpLo>
template<typename InputType>
LLT<MatrixType,_UpLo>& LLT<MatrixType,_UpLo>::compute(const EigenBase<InputType>& a)
{
check_template_parameters();
eigen_assert(a.rows()==a.cols());
*
* \returns a reference to *this
*
* Example: \include TutorialLinAlgComputeTwice.cpp
* Output: \verbinclude TutorialLinAlgComputeTwice.out
*/
template <typename MatrixType, int UpLo_>
template <typename InputType>
LLT<MatrixType, UpLo_>& LLT<MatrixType, UpLo_>::compute(const EigenBase<InputType>& a) {
eigen_assert(a.rows() == a.cols());
const Index size = a.rows();
m_matrix.resize(size, size);
m_matrix = a.derived();
if (!internal::is_same_dense(m_matrix, a.derived())) m_matrix = a.derived();
// Compute matrix L1 norm = max abs column sum.
m_l1_norm = RealScalar(0);
// TODO move this code to SelfAdjointView
// TODO: move this code to SelfAdjointView
for (Index col = 0; col < size; ++col) {
RealScalar abs_col_sum;
if (_UpLo == Lower)
abs_col_sum = m_matrix.col(col).tail(size - col).template lpNorm<1>() + m_matrix.row(col).head(col).template lpNorm<1>();
if (UpLo_ == Lower)
abs_col_sum =
m_matrix.col(col).tail(size - col).template lpNorm<1>() + m_matrix.row(col).head(col).template lpNorm<1>();
else
abs_col_sum = m_matrix.col(col).head(col).template lpNorm<1>() + m_matrix.row(col).tail(size - col).template lpNorm<1>();
if (abs_col_sum > m_l1_norm)
m_l1_norm = abs_col_sum;
abs_col_sum =
m_matrix.col(col).head(col).template lpNorm<1>() + m_matrix.row(col).tail(size - col).template lpNorm<1>();
if (abs_col_sum > m_l1_norm) m_l1_norm = abs_col_sum;
}
m_isInitialized = true;
@@ -448,18 +426,17 @@ LLT<MatrixType,_UpLo>& LLT<MatrixType,_UpLo>::compute(const EigenBase<InputType>
}
/** Performs a rank one update (or dowdate) of the current decomposition.
* If A = LL^* before the rank one update,
* then after it we have LL^* = A + sigma * v v^* where \a v must be a vector
* of same dimension.
*/
template<typename _MatrixType, int _UpLo>
template<typename VectorType>
LLT<_MatrixType,_UpLo> LLT<_MatrixType,_UpLo>::rankUpdate(const VectorType& v, const RealScalar& sigma)
{
* If A = LL^* before the rank one update,
* then after it we have LL^* = A + sigma * v v^* where \a v must be a vector
* of same dimension.
*/
template <typename MatrixType_, int UpLo_>
template <typename VectorType>
LLT<MatrixType_, UpLo_>& LLT<MatrixType_, UpLo_>::rankUpdate(const VectorType& v, const RealScalar& sigma) {
EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorType);
eigen_assert(v.size()==m_matrix.cols());
eigen_assert(v.size() == m_matrix.cols());
eigen_assert(m_isInitialized);
if(internal::llt_inplace<typename MatrixType::Scalar, UpLo>::rankUpdate(m_matrix,v,sigma)>=0)
if (internal::llt_inplace<typename MatrixType::Scalar, UpLo>::rankUpdate(m_matrix, v, sigma) >= 0)
m_info = NumericalIssue;
else
m_info = Success;
@@ -468,31 +445,40 @@ LLT<_MatrixType,_UpLo> LLT<_MatrixType,_UpLo>::rankUpdate(const VectorType& v, c
}
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename _MatrixType,int _UpLo>
template<typename RhsType, typename DstType>
void LLT<_MatrixType,_UpLo>::_solve_impl(const RhsType &rhs, DstType &dst) const
{
template <typename MatrixType_, int UpLo_>
template <typename RhsType, typename DstType>
void LLT<MatrixType_, UpLo_>::_solve_impl(const RhsType& rhs, DstType& dst) const {
_solve_impl_transposed<true>(rhs, dst);
}
template <typename MatrixType_, int UpLo_>
template <bool Conjugate, typename RhsType, typename DstType>
void LLT<MatrixType_, UpLo_>::_solve_impl_transposed(const RhsType& rhs, DstType& dst) const {
dst = rhs;
solveInPlace(dst);
matrixL().template conjugateIf<!Conjugate>().solveInPlace(dst);
matrixU().template conjugateIf<!Conjugate>().solveInPlace(dst);
}
#endif
/** \internal use x = llt_object.solve(x);
*
* This is the \em in-place version of solve().
*
* \param bAndX represents both the right-hand side matrix b and result x.
*
* This version avoids a copy when the right hand side matrix b is not needed anymore.
*
* \sa LLT::solve(), MatrixBase::llt()
*/
template<typename MatrixType, int _UpLo>
template<typename Derived>
void LLT<MatrixType,_UpLo>::solveInPlace(MatrixBase<Derived> &bAndX) const
{
*
* This is the \em in-place version of solve().
*
* \param bAndX represents both the right-hand side matrix b and result x.
*
* This version avoids a copy when the right hand side matrix b is not needed anymore.
*
* \warning The parameter is only marked 'const' to make the C++ compiler accept a temporary expression here.
* This function will const_cast it, so constness isn't honored here.
*
* \sa LLT::solve(), MatrixBase::llt()
*/
template <typename MatrixType, int UpLo_>
template <typename Derived>
void LLT<MatrixType, UpLo_>::solveInPlace(const MatrixBase<Derived>& bAndX) const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
eigen_assert(m_matrix.rows()==bAndX.rows());
eigen_assert(m_matrix.rows() == bAndX.rows());
matrixL().solveInPlace(bAndX);
matrixU().solveInPlace(bAndX);
}
@@ -500,35 +486,31 @@ void LLT<MatrixType,_UpLo>::solveInPlace(MatrixBase<Derived> &bAndX) const
/** \returns the matrix represented by the decomposition,
* i.e., it returns the product: L L^*.
* This function is provided for debug purpose. */
template<typename MatrixType, int _UpLo>
MatrixType LLT<MatrixType,_UpLo>::reconstructedMatrix() const
{
template <typename MatrixType, int UpLo_>
MatrixType LLT<MatrixType, UpLo_>::reconstructedMatrix() const {
eigen_assert(m_isInitialized && "LLT is not initialized.");
return matrixL() * matrixL().adjoint().toDenseMatrix();
}
/** \cholesky_module
* \returns the LLT decomposition of \c *this
* \sa SelfAdjointView::llt()
*/
template<typename Derived>
inline const LLT<typename MatrixBase<Derived>::PlainObject>
MatrixBase<Derived>::llt() const
{
* \returns the LLT decomposition of \c *this
* \sa SelfAdjointView::llt()
*/
template <typename Derived>
inline LLT<typename MatrixBase<Derived>::PlainObject> MatrixBase<Derived>::llt() const {
return LLT<PlainObject>(derived());
}
/** \cholesky_module
* \returns the LLT decomposition of \c *this
* \sa SelfAdjointView::llt()
*/
template<typename MatrixType, unsigned int UpLo>
inline const LLT<typename SelfAdjointView<MatrixType, UpLo>::PlainObject, UpLo>
SelfAdjointView<MatrixType, UpLo>::llt() const
{
return LLT<PlainObject,UpLo>(m_matrix);
* \returns the LLT decomposition of \c *this
* \sa SelfAdjointView::llt()
*/
template <typename MatrixType, unsigned int UpLo>
inline LLT<typename SelfAdjointView<MatrixType, UpLo>::PlainObject, UpLo> SelfAdjointView<MatrixType, UpLo>::llt()
const {
return LLT<PlainObject, UpLo>(m_matrix);
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_LLT_H
#endif // EIGEN_LLT_H

View File

@@ -33,67 +33,92 @@
#ifndef EIGEN_LLT_LAPACKE_H
#define EIGEN_LLT_LAPACKE_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename Scalar> struct lapacke_llt;
namespace lapacke_helpers {
// -------------------------------------------------------------------------------------------------------------------
// Dispatch for rank update handling upper and lower parts
// -------------------------------------------------------------------------------------------------------------------
#define EIGEN_LAPACKE_LLT(EIGTYPE, BLASTYPE, LAPACKE_PREFIX) \
template<> struct lapacke_llt<EIGTYPE> \
{ \
template<typename MatrixType> \
static inline Index potrf(MatrixType& m, char uplo) \
{ \
lapack_int matrix_order; \
lapack_int size, lda, info, StorageOrder; \
EIGTYPE* a; \
eigen_assert(m.rows()==m.cols()); \
/* Set up parameters for ?potrf */ \
size = convert_index<lapack_int>(m.rows()); \
StorageOrder = MatrixType::Flags&RowMajorBit?RowMajor:ColMajor; \
matrix_order = StorageOrder==RowMajor ? LAPACK_ROW_MAJOR : LAPACK_COL_MAJOR; \
a = &(m.coeffRef(0,0)); \
lda = convert_index<lapack_int>(m.outerStride()); \
\
info = LAPACKE_##LAPACKE_PREFIX##potrf( matrix_order, uplo, size, (BLASTYPE*)a, lda ); \
info = (info==0) ? -1 : info>0 ? info-1 : size; \
return info; \
} \
}; \
template<> struct llt_inplace<EIGTYPE, Lower> \
{ \
template<typename MatrixType> \
static Index blocked(MatrixType& m) \
{ \
return lapacke_llt<EIGTYPE>::potrf(m, 'L'); \
} \
template<typename MatrixType, typename VectorType> \
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma) \
{ return Eigen::internal::llt_rank_update_lower(mat, vec, sigma); } \
}; \
template<> struct llt_inplace<EIGTYPE, Upper> \
{ \
template<typename MatrixType> \
static Index blocked(MatrixType& m) \
{ \
return lapacke_llt<EIGTYPE>::potrf(m, 'U'); \
} \
template<typename MatrixType, typename VectorType> \
static Index rankUpdate(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma) \
{ \
Transpose<MatrixType> matt(mat); \
return llt_inplace<EIGTYPE, Lower>::rankUpdate(matt, vec.conjugate(), sigma); \
} \
template <UpLoType Mode>
struct rank_update {};
template <>
struct rank_update<Lower> {
template <typename MatrixType, typename VectorType>
static Index run(MatrixType &mat, const VectorType &vec, const typename MatrixType::RealScalar &sigma) {
return Eigen::internal::llt_rank_update_lower(mat, vec, sigma);
}
};
EIGEN_LAPACKE_LLT(double, double, d)
EIGEN_LAPACKE_LLT(float, float, s)
EIGEN_LAPACKE_LLT(dcomplex, lapack_complex_double, z)
EIGEN_LAPACKE_LLT(scomplex, lapack_complex_float, c)
template <>
struct rank_update<Upper> {
template <typename MatrixType, typename VectorType>
static Index run(MatrixType &mat, const VectorType &vec, const typename MatrixType::RealScalar &sigma) {
Transpose<MatrixType> matt(mat);
return Eigen::internal::llt_rank_update_lower(matt, vec.conjugate(), sigma);
}
};
} // end namespace internal
// -------------------------------------------------------------------------------------------------------------------
// Generic lapacke llt implementation that hands of to the dispatches
// -------------------------------------------------------------------------------------------------------------------
} // end namespace Eigen
template <typename Scalar, UpLoType Mode>
struct lapacke_llt {
EIGEN_STATIC_ASSERT(((Mode == Lower) || (Mode == Upper)), MODE_MUST_BE_UPPER_OR_LOWER)
template <typename MatrixType>
static Index blocked(MatrixType &m) {
eigen_assert(m.rows() == m.cols());
if (m.rows() == 0) {
return -1;
}
/* Set up parameters for ?potrf */
lapack_int size = to_lapack(m.rows());
lapack_int matrix_order = lapack_storage_of(m);
constexpr char uplo = Mode == Upper ? 'U' : 'L';
Scalar *a = &(m.coeffRef(0, 0));
lapack_int lda = to_lapack(m.outerStride());
#endif // EIGEN_LLT_LAPACKE_H
lapack_int info = potrf(matrix_order, uplo, size, to_lapack(a), lda);
info = (info == 0) ? -1 : info > 0 ? info - 1 : size;
return info;
}
template <typename MatrixType, typename VectorType>
static Index rankUpdate(MatrixType &mat, const VectorType &vec, const typename MatrixType::RealScalar &sigma) {
return rank_update<Mode>::run(mat, vec, sigma);
}
};
} // namespace lapacke_helpers
// end namespace lapacke_helpers
/*
* Here, we just put the generic implementation from lapacke_llt into a full specialization of the llt_inplace
* type. By being a full specialization, the versions defined here thus get precedence over the generic implementation
* in LLT.h for double, float and complex double, complex float types.
*/
#define EIGEN_LAPACKE_LLT(EIGTYPE) \
template <> \
struct llt_inplace<EIGTYPE, Lower> : public lapacke_helpers::lapacke_llt<EIGTYPE, Lower> {}; \
template <> \
struct llt_inplace<EIGTYPE, Upper> : public lapacke_helpers::lapacke_llt<EIGTYPE, Upper> {};
EIGEN_LAPACKE_LLT(double)
EIGEN_LAPACKE_LLT(float)
EIGEN_LAPACKE_LLT(std::complex<double>)
EIGEN_LAPACKE_LLT(std::complex<float>)
#undef EIGEN_LAPACKE_LLT
} // end namespace internal
} // end namespace Eigen
#endif // EIGEN_LLT_LAPACKE_H

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
#ifndef EIGEN_CHOLMODSUPPORT_MODULE_H
#error "Please include Eigen/CholmodSupport instead of including headers inside the src directory directly."
#endif

View File

@@ -0,0 +1,239 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2017 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_ARITHMETIC_SEQUENCE_H
#define EIGEN_ARITHMETIC_SEQUENCE_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
// Helper to cleanup the type of the increment:
template <typename T>
struct cleanup_seq_incr {
typedef typename cleanup_index_type<T, DynamicIndex>::type type;
};
} // namespace internal
//--------------------------------------------------------------------------------
// seq(first,last,incr) and seqN(first,size,incr)
//--------------------------------------------------------------------------------
template <typename FirstType = Index, typename SizeType = Index, typename IncrType = internal::FixedInt<1> >
class ArithmeticSequence;
template <typename FirstType, typename SizeType, typename IncrType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type,
typename internal::cleanup_seq_incr<IncrType>::type>
seqN(FirstType first, SizeType size, IncrType incr);
/** \class ArithmeticSequence
* \ingroup Core_Module
*
* This class represents an arithmetic progression \f$ a_0, a_1, a_2, ..., a_{n-1}\f$ defined by
* its \em first value \f$ a_0 \f$, its \em size (aka length) \em n, and the \em increment (aka stride)
* that is equal to \f$ a_{i+1}-a_{i}\f$ for any \em i.
*
* It is internally used as the return type of the Eigen::seq and Eigen::seqN functions, and as the input arguments
* of DenseBase::operator()(const RowIndices&, const ColIndices&), and most of the time this is the
* only way it is used.
*
* \tparam FirstType type of the first element, usually an Index,
* but internally it can be a symbolic expression
* \tparam SizeType type representing the size of the sequence, usually an Index
* or a compile time integral constant. Internally, it can also be a symbolic expression
* \tparam IncrType type of the increment, can be a runtime Index, or a compile time integral constant (default is
* compile-time 1)
*
* \sa Eigen::seq, Eigen::seqN, DenseBase::operator()(const RowIndices&, const ColIndices&), class IndexedView
*/
template <typename FirstType, typename SizeType, typename IncrType>
class ArithmeticSequence {
public:
constexpr ArithmeticSequence() = default;
constexpr ArithmeticSequence(FirstType first, SizeType size) : m_first(first), m_size(size) {}
constexpr ArithmeticSequence(FirstType first, SizeType size, IncrType incr)
: m_first(first), m_size(size), m_incr(incr) {}
enum {
// SizeAtCompileTime = internal::get_fixed_value<SizeType>::value,
IncrAtCompileTime = internal::get_fixed_value<IncrType, DynamicIndex>::value
};
/** \returns the size, i.e., number of elements, of the sequence */
constexpr Index size() const { return m_size; }
/** \returns the first element \f$ a_0 \f$ in the sequence */
constexpr Index first() const { return m_first; }
/** \returns the value \f$ a_i \f$ at index \a i in the sequence. */
constexpr Index operator[](Index i) const { return m_first + i * m_incr; }
constexpr const FirstType& firstObject() const { return m_first; }
constexpr const SizeType& sizeObject() const { return m_size; }
constexpr const IncrType& incrObject() const { return m_incr; }
protected:
FirstType m_first;
SizeType m_size;
IncrType m_incr;
public:
constexpr auto reverse() const -> decltype(Eigen::seqN(m_first + (m_size + fix<-1>()) * m_incr, m_size, -m_incr)) {
return seqN(m_first + (m_size + fix<-1>()) * m_incr, m_size, -m_incr);
}
};
/** \returns an ArithmeticSequence starting at \a first, of length \a size, and increment \a incr
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template <typename FirstType, typename SizeType, typename IncrType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type,
typename internal::cleanup_seq_incr<IncrType>::type>
seqN(FirstType first, SizeType size, IncrType incr) {
return ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type,
typename internal::cleanup_seq_incr<IncrType>::type>(first, size, incr);
}
/** \returns an ArithmeticSequence starting at \a first, of length \a size, and unit increment
*
* \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType) */
template <typename FirstType, typename SizeType>
ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type>
seqN(FirstType first, SizeType size) {
return ArithmeticSequence<typename internal::cleanup_index_type<FirstType>::type,
typename internal::cleanup_index_type<SizeType>::type>(first, size);
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and with positive (or negative) increment \a
* incr
*
* It is essentially an alias to:
* \code
* seqN(f, (l-f+incr)/incr, incr);
* \endcode
*
* \sa seqN(FirstType,SizeType,IncrType), seq(FirstType,LastType)
*/
template <typename FirstType, typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr);
/** \returns an ArithmeticSequence starting at \a f, up (or down) to \a l, and unit increment
*
* It is essentially an alias to:
* \code
* seqN(f,l-f+1);
* \endcode
*
* \sa seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType)
*/
template <typename FirstType, typename LastType>
auto seq(FirstType f, LastType l);
#else // EIGEN_PARSED_BY_DOXYGEN
template <typename FirstType, typename LastType>
auto seq(FirstType f, LastType l)
-> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(typename internal::cleanup_index_type<LastType>::type(l) -
typename internal::cleanup_index_type<FirstType>::type(f) + fix<1>()))) {
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(typename internal::cleanup_index_type<LastType>::type(l) -
typename internal::cleanup_index_type<FirstType>::type(f) + fix<1>()));
}
template <typename FirstType, typename LastType, typename IncrType>
auto seq(FirstType f, LastType l, IncrType incr)
-> decltype(seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(typename internal::cleanup_index_type<LastType>::type(l) -
typename internal::cleanup_index_type<FirstType>::type(f) +
typename internal::cleanup_seq_incr<IncrType>::type(incr)) /
typename internal::cleanup_seq_incr<IncrType>::type(incr),
typename internal::cleanup_seq_incr<IncrType>::type(incr))) {
typedef typename internal::cleanup_seq_incr<IncrType>::type CleanedIncrType;
return seqN(typename internal::cleanup_index_type<FirstType>::type(f),
(typename internal::cleanup_index_type<LastType>::type(l) -
typename internal::cleanup_index_type<FirstType>::type(f) + CleanedIncrType(incr)) /
CleanedIncrType(incr),
CleanedIncrType(incr));
}
#endif // EIGEN_PARSED_BY_DOXYGEN
namespace placeholders {
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with increment \a incr.
*
* It is a shortcut for: \code seqN(last-(size-fix<1>)*incr, size, incr) \endcode
* \anchor Eigen_placeholders_lastN
* \sa lastN(SizeType), seqN(FirstType,SizeType), seq(FirstType,LastType,IncrType) */
template <typename SizeType, typename IncrType>
auto lastN(SizeType size, IncrType incr)
-> decltype(seqN(Eigen::placeholders::last - (size - fix<1>()) * incr, size, incr)) {
return seqN(Eigen::placeholders::last - (size - fix<1>()) * incr, size, incr);
}
/** \cpp11
* \returns a symbolic ArithmeticSequence representing the last \a size elements with a unit increment.
*
* It is a shortcut for: \code seq(last+fix<1>-size, last) \endcode
*
* \sa lastN(SizeType,IncrType, seqN(FirstType,SizeType), seq(FirstType,LastType) */
template <typename SizeType>
auto lastN(SizeType size) -> decltype(seqN(Eigen::placeholders::last + fix<1>() - size, size)) {
return seqN(Eigen::placeholders::last + fix<1>() - size, size);
}
} // namespace placeholders
/** \namespace Eigen::indexing
* \ingroup Core_Module
*
* The sole purpose of this namespace is to be able to import all functions
* and symbols that are expected to be used within operator() for indexing
* and slicing. If you already imported the whole Eigen namespace:
* \code using namespace Eigen; \endcode
* then you are already all set. Otherwise, if you don't want/cannot import
* the whole Eigen namespace, the following line:
* \code using namespace Eigen::indexing; \endcode
* is equivalent to:
* \code
using Eigen::fix;
using Eigen::seq;
using Eigen::seqN;
using Eigen::placeholders::all;
using Eigen::placeholders::last;
using Eigen::placeholders::lastN; // c++11 only
using Eigen::placeholders::lastp1;
\endcode
*/
namespace indexing {
using Eigen::fix;
using Eigen::seq;
using Eigen::seqN;
using Eigen::placeholders::all;
using Eigen::placeholders::last;
using Eigen::placeholders::lastN;
using Eigen::placeholders::lastp1;
} // namespace indexing
} // end namespace Eigen
#endif // EIGEN_ARITHMETIC_SEQUENCE_H

View File

@@ -10,316 +10,365 @@
#ifndef EIGEN_ARRAY_H
#define EIGEN_ARRAY_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
struct traits<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > : traits<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
{
template <typename Scalar_, int Rows_, int Cols_, int Options_, int MaxRows_, int MaxCols_>
struct traits<Array<Scalar_, Rows_, Cols_, Options_, MaxRows_, MaxCols_>>
: traits<Matrix<Scalar_, Rows_, Cols_, Options_, MaxRows_, MaxCols_>> {
typedef ArrayXpr XprKind;
typedef ArrayBase<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > XprBase;
typedef ArrayBase<Array<Scalar_, Rows_, Cols_, Options_, MaxRows_, MaxCols_>> XprBase;
};
}
} // namespace internal
/** \class Array
* \ingroup Core_Module
*
* \brief General-purpose arrays with easy API for coefficient-wise operations
*
* The %Array class is very similar to the Matrix class. It provides
* general-purpose one- and two-dimensional arrays. The difference between the
* %Array and the %Matrix class is primarily in the API: the API for the
* %Array class provides easy access to coefficient-wise operations, while the
* API for the %Matrix class provides easy access to linear-algebra
* operations.
*
* See documentation of class Matrix for detailed information on the template parameters
* storage layout.
*
* This class can be extended with the help of the plugin mechanism described on the page
* \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_ARRAY_PLUGIN.
*
* \sa \blank \ref TutorialArrayClass, \ref TopicClassHierarchy
*/
template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
class Array
: public PlainObjectBase<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
{
public:
* \ingroup Core_Module
*
* \brief General-purpose arrays with easy API for coefficient-wise operations
*
* The %Array class is very similar to the Matrix class. It provides
* general-purpose one- and two-dimensional arrays. The difference between the
* %Array and the %Matrix class is primarily in the API: the API for the
* %Array class provides easy access to coefficient-wise operations, while the
* API for the %Matrix class provides easy access to linear-algebra
* operations.
*
* See documentation of class Matrix for detailed information on the template parameters
* storage layout.
*
* This class can be extended with the help of the plugin mechanism described on the page
* \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_ARRAY_PLUGIN.
*
* \sa \blank \ref TutorialArrayClass, \ref TopicClassHierarchy
*/
template <typename Scalar_, int Rows_, int Cols_, int Options_, int MaxRows_, int MaxCols_>
class Array : public PlainObjectBase<Array<Scalar_, Rows_, Cols_, Options_, MaxRows_, MaxCols_>> {
public:
typedef PlainObjectBase<Array> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Array)
typedef PlainObjectBase<Array> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Array)
enum { Options = Options_ };
typedef typename Base::PlainObject PlainObject;
enum { Options = _Options };
typedef typename Base::PlainObject PlainObject;
protected:
template <typename Derived, typename OtherDerived, bool IsVector>
friend struct internal::conservative_resize_like_impl;
protected:
template <typename Derived, typename OtherDerived, bool IsVector>
friend struct internal::conservative_resize_like_impl;
using Base::m_storage;
using Base::m_storage;
public:
using Base::base;
using Base::coeff;
using Base::coeffRef;
public:
/**
* The usage of
* using Base::operator=;
* fails on MSVC. Since the code below is working with GCC and MSVC, we skipped
* the usage of 'using'. This should be done only for operator=.
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array& operator=(const EigenBase<OtherDerived>& other) {
return Base::operator=(other);
}
using Base::base;
using Base::coeff;
using Base::coeffRef;
/** Set all the entries to \a value.
* \sa DenseBase::setConstant(), DenseBase::fill()
*/
/* This overload is needed because the usage of
* using Base::operator=;
* fails on MSVC. Since the code below is working with GCC and MSVC, we skipped
* the usage of 'using'. This should be done only for operator=.
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array& operator=(const Scalar& value) {
Base::setConstant(value);
return *this;
}
/**
* The usage of
* using Base::operator=;
* fails on MSVC. Since the code below is working with GCC and MSVC, we skipped
* the usage of 'using'. This should be done only for operator=.
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array& operator=(const EigenBase<OtherDerived> &other)
{
return Base::operator=(other);
}
/** Copies the value of the expression \a other into \c *this with automatic resizing.
*
* *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized),
* it will be initialized.
*
* Note that copying a row-vector into a vector (and conversely) is allowed.
* The resizing, if any, is then done in the appropriate way so that row-vectors
* remain row-vectors and vectors remain vectors.
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array& operator=(const DenseBase<OtherDerived>& other) {
return Base::_set(other);
}
/** Set all the entries to \a value.
* \sa DenseBase::setConstant(), DenseBase::fill()
*/
/* This overload is needed because the usage of
* using Base::operator=;
* fails on MSVC. Since the code below is working with GCC and MSVC, we skipped
* the usage of 'using'. This should be done only for operator=.
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array& operator=(const Scalar &value)
{
Base::setConstant(value);
return *this;
}
/**
* \brief Assigns arrays to each other.
*
* \note This is a special case of the templated operator=. Its purpose is
* to prevent a default operator= from hiding the templated operator=.
*
* \callgraph
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array& operator=(const Array& other) { return Base::_set(other); }
/** Copies the value of the expression \a other into \c *this with automatic resizing.
*
* *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized),
* it will be initialized.
*
* Note that copying a row-vector into a vector (and conversely) is allowed.
* The resizing, if any, is then done in the appropriate way so that row-vectors
* remain row-vectors and vectors remain vectors.
*/
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array& operator=(const DenseBase<OtherDerived>& other)
{
return Base::_set(other);
}
/** Default constructor.
*
* For fixed-size matrices, does nothing.
*
* For dynamic-size matrices, creates an empty matrix of size 0. Does not allocate any array. Such a matrix
* is called a null matrix. This constructor is the unique way to create null matrices: resizing
* a matrix to 0 is not supported.
*
* \sa resize(Index,Index)
*/
#ifdef EIGEN_INITIALIZE_COEFFS
EIGEN_DEVICE_FUNC constexpr Array() : Base() { EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED }
#else
EIGEN_DEVICE_FUNC constexpr Array() = default;
#endif
/** \brief Move constructor */
EIGEN_DEVICE_FUNC constexpr Array(Array&&) = default;
EIGEN_DEVICE_FUNC Array& operator=(Array&& other) noexcept(std::is_nothrow_move_assignable<Scalar>::value) {
Base::operator=(std::move(other));
return *this;
}
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array& operator=(const Array& other)
{
return Base::_set(other);
}
/** Default constructor.
*
* For fixed-size matrices, does nothing.
*
* For dynamic-size matrices, creates an empty matrix of size 0. Does not allocate any array. Such a matrix
* is called a null matrix. This constructor is the unique way to create null matrices: resizing
* a matrix to 0 is not supported.
*
* \sa resize(Index,Index)
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array() : Base()
{
Base::_check_template_params();
EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
}
/** \brief Construct a row of column vector with fixed size from an arbitrary number of coefficients.
*
* \only_for_vectors
*
* This constructor is for 1D array or vectors with more than 4 coefficients.
*
* \warning To construct a column (resp. row) vector of fixed length, the number of values passed to this
* constructor must match the fixed number of rows (resp. columns) of \c *this.
*
*
* Example: \include Array_variadic_ctor_cxx11.cpp
* Output: \verbinclude Array_variadic_ctor_cxx11.out
*
* \sa Array(const std::initializer_list<std::initializer_list<Scalar>>&)
* \sa Array(const Scalar&), Array(const Scalar&,const Scalar&)
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3,
const ArgTypes&... args)
: Base(a0, a1, a2, a3, args...) {}
/** \brief Constructs an array and initializes it from the coefficients given as initializer-lists grouped by row.
* \cpp11
*
* In the general case, the constructor takes a list of rows, each row being represented as a list of coefficients:
*
* Example: \include Array_initializer_list_23_cxx11.cpp
* Output: \verbinclude Array_initializer_list_23_cxx11.out
*
* Each of the inner initializer lists must contain the exact same number of elements, otherwise an assertion is
* triggered.
*
* In the case of a compile-time column 1D array, implicit transposition from a single row is allowed.
* Therefore <code> Array<int,Dynamic,1>{{1,2,3,4,5}}</code> is legal and the more verbose syntax
* <code>Array<int,Dynamic,1>{{1},{2},{3},{4},{5}}</code> can be avoided:
*
* Example: \include Array_initializer_list_vector_cxx11.cpp
* Output: \verbinclude Array_initializer_list_vector_cxx11.out
*
* In the case of fixed-sized arrays, the initializer list sizes must exactly match the array sizes,
* and implicit transposition is allowed for compile-time 1D arrays only.
*
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC constexpr Array(const std::initializer_list<std::initializer_list<Scalar>>& list) : Base(list) {}
#ifndef EIGEN_PARSED_BY_DOXYGEN
// FIXME is it still needed ??
/** \internal */
EIGEN_DEVICE_FUNC
Array(internal::constructor_without_unaligned_array_assert)
: Base(internal::constructor_without_unaligned_array_assert())
{
Base::_check_template_params();
EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
}
template <typename T>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Array(const T& x) {
Base::template _init1<T>(x);
}
template <typename T0, typename T1>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array(const T0& val0, const T1& val1) {
this->template _init2<T0, T1>(val0, val1);
}
#else
/** \brief Constructs a fixed-sized array initialized with coefficients starting at \a data */
EIGEN_DEVICE_FUNC explicit Array(const Scalar* data);
/** Constructs a vector or row-vector with given dimension. \only_for_vectors
*
* Note that this is only useful for dynamic-size vectors. For fixed-size vectors,
* it is redundant to pass the dimension here, so it makes more sense to use the default
* constructor Array() instead.
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE explicit Array(Index dim);
/** constructs an initialized 1x1 Array with the given coefficient
* \sa const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args */
Array(const Scalar& value);
/** constructs an uninitialized array with \a rows rows and \a cols columns.
*
* This is useful for dynamic-size arrays. For fixed-size arrays,
* it is redundant to pass these parameters, so one should use the default constructor
* Array() instead. */
Array(Index rows, Index cols);
/** constructs an initialized 2D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args) */
Array(const Scalar& val0, const Scalar& val1);
#endif // end EIGEN_PARSED_BY_DOXYGEN
/** constructs an initialized 3D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2) {
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 3)
m_storage.data()[0] = val0;
m_storage.data()[1] = val1;
m_storage.data()[2] = val2;
}
/** constructs an initialized 4D vector with given coefficients
* \sa Array(const Scalar& a0, const Scalar& a1, const Scalar& a2, const Scalar& a3, const ArgTypes&... args)
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2,
const Scalar& val3) {
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 4)
m_storage.data()[0] = val0;
m_storage.data()[1] = val1;
m_storage.data()[2] = val2;
m_storage.data()[3] = val3;
}
/** Copy constructor */
EIGEN_DEVICE_FUNC constexpr Array(const Array&) = default;
private:
struct PrivateType {};
public:
/** \sa MatrixBase::operator=(const EigenBase<OtherDerived>&) */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Array(
const EigenBase<OtherDerived>& other,
std::enable_if_t<internal::is_convertible<typename OtherDerived::Scalar, Scalar>::value, PrivateType> =
PrivateType())
: Base(other.derived()) {}
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept { return 1; }
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept { return this->innerSize(); }
#ifdef EIGEN_ARRAY_PLUGIN
#include EIGEN_ARRAY_PLUGIN
#endif
#if EIGEN_HAS_RVALUE_REFERENCES
EIGEN_DEVICE_FUNC
Array(Array&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_constructible<Scalar>::value)
: Base(std::move(other))
{
Base::_check_template_params();
if (RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic)
Base::_set_noalias(other);
}
EIGEN_DEVICE_FUNC
Array& operator=(Array&& other) EIGEN_NOEXCEPT_IF(std::is_nothrow_move_assignable<Scalar>::value)
{
other.swap(*this);
return *this;
}
#endif
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename T>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE explicit Array(const T& x)
{
Base::_check_template_params();
Base::template _init1<T>(x);
}
template<typename T0, typename T1>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const T0& val0, const T1& val1)
{
Base::_check_template_params();
this->template _init2<T0,T1>(val0, val1);
}
#else
/** \brief Constructs a fixed-sized array initialized with coefficients starting at \a data */
EIGEN_DEVICE_FUNC explicit Array(const Scalar *data);
/** Constructs a vector or row-vector with given dimension. \only_for_vectors
*
* Note that this is only useful for dynamic-size vectors. For fixed-size vectors,
* it is redundant to pass the dimension here, so it makes more sense to use the default
* constructor Array() instead.
*/
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE explicit Array(Index dim);
/** constructs an initialized 1x1 Array with the given coefficient */
Array(const Scalar& value);
/** constructs an uninitialized array with \a rows rows and \a cols columns.
*
* This is useful for dynamic-size arrays. For fixed-size arrays,
* it is redundant to pass these parameters, so one should use the default constructor
* Array() instead. */
Array(Index rows, Index cols);
/** constructs an initialized 2D vector with given coefficients */
Array(const Scalar& val0, const Scalar& val1);
#endif
/** constructs an initialized 3D vector with given coefficients */
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2)
{
Base::_check_template_params();
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 3)
m_storage.data()[0] = val0;
m_storage.data()[1] = val1;
m_storage.data()[2] = val2;
}
/** constructs an initialized 4D vector with given coefficients */
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2, const Scalar& val3)
{
Base::_check_template_params();
EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 4)
m_storage.data()[0] = val0;
m_storage.data()[1] = val1;
m_storage.data()[2] = val2;
m_storage.data()[3] = val3;
}
/** Copy constructor */
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const Array& other)
: Base(other)
{ }
/** \sa MatrixBase::operator=(const EigenBase<OtherDerived>&) */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Array(const EigenBase<OtherDerived> &other)
: Base(other.derived())
{ }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
#ifdef EIGEN_ARRAY_PLUGIN
#include EIGEN_ARRAY_PLUGIN
#endif
private:
template<typename MatrixType, typename OtherDerived, bool SwapPointers>
friend struct internal::matrix_swap_impl;
private:
template <typename MatrixType, typename OtherDerived, bool SwapPointers>
friend struct internal::matrix_swap_impl;
};
/** \defgroup arraytypedefs Global array typedefs
* \ingroup Core_Module
*
* Eigen defines several typedef shortcuts for most common 1D and 2D array types.
*
* The general patterns are the following:
*
* \c ArrayRowsColsType where \c Rows and \c Cols can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size,
* and where \c Type can be \c i for integer, \c f for float, \c d for double, \c cf for complex float, \c cd
* for complex double.
*
* For example, \c Array33d is a fixed-size 3x3 array type of doubles, and \c ArrayXXf is a dynamic-size matrix of floats.
*
* There are also \c ArraySizeType which are self-explanatory. For example, \c Array4cf is
* a fixed-size 1D array of 4 complex floats.
*
* \sa class Array
*/
* \ingroup Core_Module
*
* %Eigen defines several typedef shortcuts for most common 1D and 2D array types.
*
* The general patterns are the following:
*
* \c ArrayRowsColsType where \c Rows and \c Cols can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for
* dynamic size, and where \c Type can be \c i for integer, \c f for float, \c d for double, \c cf for complex float, \c
* cd for complex double.
*
* For example, \c Array33d is a fixed-size 3x3 array type of doubles, and \c ArrayXXf is a dynamic-size matrix of
* floats.
*
* There are also \c ArraySizeType which are self-explanatory. For example, \c Array4cf is
* a fixed-size 1D array of 4 complex floats.
*
* With \cpp11, template alias are also defined for common sizes.
* They follow the same pattern as above except that the scalar type suffix is replaced by a
* template parameter, i.e.:
* - `ArrayRowsCols<Type>` where `Rows` and `Cols` can be \c 2,\c 3,\c 4, or \c X for fixed or dynamic size.
* - `ArraySize<Type>` where `Size` can be \c 2,\c 3,\c 4 or \c X for fixed or dynamic size 1D arrays.
*
* \sa class Array
*/
#define EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix) \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, Size> Array##SizeSuffix##SizeSuffix##TypeSuffix; \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, 1> Array##SizeSuffix##TypeSuffix;
#define EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix) \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, Size> Array##SizeSuffix##SizeSuffix##TypeSuffix; \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, 1> Array##SizeSuffix##TypeSuffix;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, Size) \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, Dynamic> Array##Size##X##TypeSuffix; \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Dynamic, Size> Array##X##Size##TypeSuffix;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, Size) \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Size, Dynamic> Array##Size##X##TypeSuffix; \
/** \ingroup arraytypedefs */ \
typedef Array<Type, Dynamic, Size> Array##X##Size##TypeSuffix;
#define EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 2, 2) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 3, 3) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 4, 4) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Dynamic, X) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 2) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 3) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 4)
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 2, 2) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 3, 3) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 4, 4) \
EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Dynamic, X) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 2) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 3) \
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 4)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(int, i)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(float, f)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(double, d)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<float>, cf)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(int, i)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(float, f)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(double, d)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<float>, cf)
EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<double>, cd)
#undef EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES
#undef EIGEN_MAKE_ARRAY_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_TYPEDEFS_LARGE
#define EIGEN_MAKE_ARRAY_TYPEDEFS(Size, SizeSuffix) \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix##SizeSuffix = Array<Type, Size, Size>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##SizeSuffix = Array<Type, Size, 1>;
#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Size) \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##Size##X = Array<Type, Size, Dynamic>; \
/** \ingroup arraytypedefs */ \
/** \brief \cpp11 */ \
template <typename Type> \
using Array##X##Size = Array<Type, Dynamic, Size>;
EIGEN_MAKE_ARRAY_TYPEDEFS(2, 2)
EIGEN_MAKE_ARRAY_TYPEDEFS(3, 3)
EIGEN_MAKE_ARRAY_TYPEDEFS(4, 4)
EIGEN_MAKE_ARRAY_TYPEDEFS(Dynamic, X)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(2)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(3)
EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(4)
#undef EIGEN_MAKE_ARRAY_TYPEDEFS
#undef EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
using Eigen::Matrix##SizeSuffix##TypeSuffix; \
using Eigen::Vector##SizeSuffix##TypeSuffix; \
using Eigen::RowVector##SizeSuffix##TypeSuffix;
using Eigen::Matrix##SizeSuffix##TypeSuffix; \
using Eigen::Vector##SizeSuffix##TypeSuffix; \
using Eigen::RowVector##SizeSuffix##TypeSuffix;
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(TypeSuffix) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 2) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 3) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 4) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, X) \
#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(TypeSuffix) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 2) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 3) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 4) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, X)
#define EIGEN_USING_ARRAY_TYPEDEFS \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(i) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(f) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(d) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cf) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cd)
#define EIGEN_USING_ARRAY_TYPEDEFS \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(i) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(f) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(d) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cf) \
EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cd)
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_ARRAY_H
#endif // EIGEN_ARRAY_H

View File

@@ -10,217 +10,201 @@
#ifndef EIGEN_ARRAYBASE_H
#define EIGEN_ARRAYBASE_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
template<typename ExpressionType> class MatrixWrapper;
namespace Eigen {
template <typename ExpressionType>
class MatrixWrapper;
/** \class ArrayBase
* \ingroup Core_Module
*
* \brief Base class for all 1D and 2D array, and related expressions
*
* An array is similar to a dense vector or matrix. While matrices are mathematical
* objects with well defined linear algebra operators, an array is just a collection
* of scalar values arranged in a one or two dimensionnal fashion. As the main consequence,
* all operations applied to an array are performed coefficient wise. Furthermore,
* arrays support scalar math functions of the c++ standard library (e.g., std::sin(x)), and convenient
* constructors allowing to easily write generic code working for both scalar values
* and arrays.
*
* This class is the base that is inherited by all array expression types.
*
* \tparam Derived is the derived type, e.g., an array or an expression type.
*
* This class can be extended with the help of the plugin mechanism described on the page
* \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_ARRAYBASE_PLUGIN.
*
* \sa class MatrixBase, \ref TopicClassHierarchy
*/
template<typename Derived> class ArrayBase
: public DenseBase<Derived>
{
public:
* \ingroup Core_Module
*
* \brief Base class for all 1D and 2D array, and related expressions
*
* An array is similar to a dense vector or matrix. While matrices are mathematical
* objects with well defined linear algebra operators, an array is just a collection
* of scalar values arranged in a one or two dimensional fashion. As the main consequence,
* all operations applied to an array are performed coefficient wise. Furthermore,
* arrays support scalar math functions of the c++ standard library (e.g., std::sin(x)), and convenient
* constructors allowing to easily write generic code working for both scalar values
* and arrays.
*
* This class is the base that is inherited by all array expression types.
*
* \tparam Derived is the derived type, e.g., an array or an expression type.
*
* This class can be extended with the help of the plugin mechanism described on the page
* \ref TopicCustomizing_Plugins by defining the preprocessor symbol \c EIGEN_ARRAYBASE_PLUGIN.
*
* \sa class MatrixBase, \ref TopicClassHierarchy
*/
template <typename Derived>
class ArrayBase : public DenseBase<Derived> {
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** The base class for a given storage type. */
typedef ArrayBase StorageBaseType;
/** The base class for a given storage type. */
typedef ArrayBase StorageBaseType;
typedef ArrayBase Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl;
typedef ArrayBase Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef typename internal::packet_traits<Scalar>::type PacketScalar;
typedef typename NumTraits<Scalar>::Real RealScalar;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef typename internal::packet_traits<Scalar>::type PacketScalar;
typedef typename NumTraits<Scalar>::Real RealScalar;
typedef DenseBase<Derived> Base;
using Base::RowsAtCompileTime;
using Base::ColsAtCompileTime;
using Base::SizeAtCompileTime;
using Base::MaxRowsAtCompileTime;
using Base::MaxColsAtCompileTime;
using Base::MaxSizeAtCompileTime;
using Base::IsVectorAtCompileTime;
using Base::Flags;
using Base::derived;
using Base::const_cast_derived;
using Base::rows;
using Base::cols;
using Base::size;
using Base::coeff;
using Base::coeffRef;
using Base::lazyAssign;
using Base::operator=;
using Base::operator+=;
using Base::operator-=;
using Base::operator*=;
using Base::operator/=;
typedef DenseBase<Derived> Base;
using Base::ColsAtCompileTime;
using Base::Flags;
using Base::IsVectorAtCompileTime;
using Base::MaxColsAtCompileTime;
using Base::MaxRowsAtCompileTime;
using Base::MaxSizeAtCompileTime;
using Base::RowsAtCompileTime;
using Base::SizeAtCompileTime;
typedef typename Base::CoeffReturnType CoeffReturnType;
using Base::coeff;
using Base::coeffRef;
using Base::cols;
using Base::const_cast_derived;
using Base::derived;
using Base::lazyAssign;
using Base::rows;
using Base::size;
using Base::operator-;
using Base::operator=;
using Base::operator+=;
using Base::operator-=;
using Base::operator*=;
using Base::operator/=;
#endif // not EIGEN_PARSED_BY_DOXYGEN
typedef typename Base::CoeffReturnType CoeffReturnType;
#ifndef EIGEN_PARSED_BY_DOXYGEN
typedef typename Base::PlainObject PlainObject;
typedef typename Base::PlainObject PlainObject;
/** \internal Represents a matrix with all coefficients equal to one another*/
typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,PlainObject> ConstantReturnType;
#endif // not EIGEN_PARSED_BY_DOXYGEN
/** \internal Represents a matrix with all coefficients equal to one another*/
typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>, PlainObject> ConstantReturnType;
#endif // not EIGEN_PARSED_BY_DOXYGEN
#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::ArrayBase
#define EIGEN_DOC_UNARY_ADDONS(X,Y)
# include "../plugins/CommonCwiseUnaryOps.h"
# include "../plugins/MatrixCwiseUnaryOps.h"
# include "../plugins/ArrayCwiseUnaryOps.h"
# include "../plugins/CommonCwiseBinaryOps.h"
# include "../plugins/MatrixCwiseBinaryOps.h"
# include "../plugins/ArrayCwiseBinaryOps.h"
# ifdef EIGEN_ARRAYBASE_PLUGIN
# include EIGEN_ARRAYBASE_PLUGIN
# endif
#define EIGEN_DOC_UNARY_ADDONS(X, Y)
#include "../plugins/MatrixCwiseUnaryOps.inc"
#include "../plugins/ArrayCwiseUnaryOps.inc"
#include "../plugins/CommonCwiseBinaryOps.inc"
#include "../plugins/MatrixCwiseBinaryOps.inc"
#include "../plugins/ArrayCwiseBinaryOps.inc"
#ifdef EIGEN_ARRAYBASE_PLUGIN
#include EIGEN_ARRAYBASE_PLUGIN
#endif
#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
#undef EIGEN_DOC_UNARY_ADDONS
/** Special case of the template operator=, in order to prevent the compiler
* from generating a default operator= (issue hit with g++ 4.1)
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator=(const ArrayBase& other)
{
internal::call_assignment(derived(), other.derived());
return derived();
}
/** Set all the entries to \a value.
* \sa DenseBase::setConstant(), DenseBase::fill() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator=(const Scalar &value)
{ Base::setConstant(value); return derived(); }
/** Special case of the template operator=, in order to prevent the compiler
* from generating a default operator= (issue hit with g++ 4.1)
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const ArrayBase& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator+=(const Scalar& scalar);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator-=(const Scalar& scalar);
/** Set all the entries to \a value.
* \sa DenseBase::setConstant(), DenseBase::fill() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Scalar& value) {
Base::setConstant(value);
return derived();
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator+=(const ArrayBase<OtherDerived>& other);
template<typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator-=(const ArrayBase<OtherDerived>& other);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator+=(const Scalar& other) {
internal::call_assignment(this->derived(), PlainObject::Constant(rows(), cols(), other),
internal::add_assign_op<Scalar, Scalar>());
return derived();
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator*=(const ArrayBase<OtherDerived>& other);
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator-=(const Scalar& other) {
internal::call_assignment(this->derived(), PlainObject::Constant(rows(), cols(), other),
internal::sub_assign_op<Scalar, Scalar>());
return derived();
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Derived& operator/=(const ArrayBase<OtherDerived>& other);
/** replaces \c *this by \c *this + \a other.
*
* \returns a reference to \c *this
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator+=(const ArrayBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
public:
EIGEN_DEVICE_FUNC
ArrayBase<Derived>& array() { return *this; }
EIGEN_DEVICE_FUNC
const ArrayBase<Derived>& array() const { return *this; }
/** replaces \c *this by \c *this - \a other.
*
* \returns a reference to \c *this
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator-=(const ArrayBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
/** \returns an \link Eigen::MatrixBase Matrix \endlink expression of this array
* \sa MatrixBase::array() */
EIGEN_DEVICE_FUNC
MatrixWrapper<Derived> matrix() { return MatrixWrapper<Derived>(derived()); }
EIGEN_DEVICE_FUNC
const MatrixWrapper<const Derived> matrix() const { return MatrixWrapper<const Derived>(derived()); }
/** replaces \c *this by \c *this * \a other coefficient wise.
*
* \returns a reference to \c *this
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator*=(const ArrayBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::mul_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
// template<typename Dest>
// inline void evalTo(Dest& dst) const { dst = matrix(); }
/** replaces \c *this by \c *this / \a other coefficient wise.
*
* \returns a reference to \c *this
*/
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator/=(const ArrayBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::div_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
protected:
EIGEN_DEVICE_FUNC
ArrayBase() : Base() {}
public:
EIGEN_DEVICE_FUNC constexpr ArrayBase<Derived>& array() { return *this; }
EIGEN_DEVICE_FUNC constexpr const ArrayBase<Derived>& array() const { return *this; }
private:
explicit ArrayBase(Index);
ArrayBase(Index,Index);
template<typename OtherDerived> explicit ArrayBase(const ArrayBase<OtherDerived>&);
protected:
// mixing arrays and matrices is not legal
template<typename OtherDerived> Derived& operator+=(const MatrixBase<OtherDerived>& )
{EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
// mixing arrays and matrices is not legal
template<typename OtherDerived> Derived& operator-=(const MatrixBase<OtherDerived>& )
{EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
/** \returns an \link Eigen::MatrixBase Matrix \endlink expression of this array
* \sa MatrixBase::array() */
EIGEN_DEVICE_FUNC constexpr MatrixWrapper<Derived> matrix() { return MatrixWrapper<Derived>(derived()); }
EIGEN_DEVICE_FUNC constexpr const MatrixWrapper<const Derived> matrix() const {
return MatrixWrapper<const Derived>(derived());
}
protected:
EIGEN_DEFAULT_COPY_CONSTRUCTOR(ArrayBase)
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(ArrayBase)
private:
explicit ArrayBase(Index);
ArrayBase(Index, Index);
template <typename OtherDerived>
explicit ArrayBase(const ArrayBase<OtherDerived>&);
protected:
// mixing arrays and matrices is not legal
template <typename OtherDerived>
Derived& operator+=(const MatrixBase<OtherDerived>&) {
EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar)) == -1,
YOU_CANNOT_MIX_ARRAYS_AND_MATRICES);
return *this;
}
// mixing arrays and matrices is not legal
template <typename OtherDerived>
Derived& operator-=(const MatrixBase<OtherDerived>&) {
EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar)) == -1,
YOU_CANNOT_MIX_ARRAYS_AND_MATRICES);
return *this;
}
};
/** replaces \c *this by \c *this - \a other.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
ArrayBase<Derived>::operator-=(const ArrayBase<OtherDerived> &other)
{
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar,typename OtherDerived::Scalar>());
return derived();
}
} // end namespace Eigen
/** replaces \c *this by \c *this + \a other.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
ArrayBase<Derived>::operator+=(const ArrayBase<OtherDerived>& other)
{
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar,typename OtherDerived::Scalar>());
return derived();
}
/** replaces \c *this by \c *this * \a other coefficient wise.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
ArrayBase<Derived>::operator*=(const ArrayBase<OtherDerived>& other)
{
call_assignment(derived(), other.derived(), internal::mul_assign_op<Scalar,typename OtherDerived::Scalar>());
return derived();
}
/** replaces \c *this by \c *this / \a other coefficient wise.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
ArrayBase<Derived>::operator/=(const ArrayBase<OtherDerived>& other)
{
call_assignment(derived(), other.derived(), internal::div_assign_op<Scalar,typename OtherDerived::Scalar>());
return derived();
}
} // end namespace Eigen
#endif // EIGEN_ARRAYBASE_H
#endif // EIGEN_ARRAYBASE_H

View File

@@ -10,198 +10,157 @@
#ifndef EIGEN_ARRAYWRAPPER_H
#define EIGEN_ARRAYWRAPPER_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \class ArrayWrapper
* \ingroup Core_Module
*
* \brief Expression of a mathematical vector or matrix as an array object
*
* This class is the return type of MatrixBase::array(), and most of the time
* this is the only way it is use.
*
* \sa MatrixBase::array(), class MatrixWrapper
*/
* \ingroup Core_Module
*
* \brief Expression of a mathematical vector or matrix as an array object
*
* This class is the return type of MatrixBase::array(), and most of the time
* this is the only way it is used.
*
* \sa MatrixBase::array(), class MatrixWrapper
*/
namespace internal {
template<typename ExpressionType>
struct traits<ArrayWrapper<ExpressionType> >
: public traits<typename remove_all<typename ExpressionType::Nested>::type >
{
template <typename ExpressionType>
struct traits<ArrayWrapper<ExpressionType> > : public traits<remove_all_t<typename ExpressionType::Nested> > {
typedef ArrayXpr XprKind;
// Let's remove NestByRefBit
enum {
Flags0 = traits<typename remove_all<typename ExpressionType::Nested>::type >::Flags,
Flags = Flags0 & ~NestByRefBit
Flags0 = traits<remove_all_t<typename ExpressionType::Nested> >::Flags,
LvalueBitFlag = is_lvalue<ExpressionType>::value ? LvalueBit : 0,
Flags = (Flags0 & ~(NestByRefBit | LvalueBit)) | LvalueBitFlag
};
};
}
} // namespace internal
template<typename ExpressionType>
class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
{
public:
typedef ArrayBase<ArrayWrapper> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ArrayWrapper)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ArrayWrapper)
typedef typename internal::remove_all<ExpressionType>::type NestedExpression;
template <typename ExpressionType>
class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> > {
public:
typedef ArrayBase<ArrayWrapper> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ArrayWrapper)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ArrayWrapper)
typedef internal::remove_all_t<ExpressionType> NestedExpression;
typedef typename internal::conditional<
internal::is_lvalue<ExpressionType>::value,
Scalar,
const Scalar
>::type ScalarWithConstIfNotLvalue;
typedef std::conditional_t<internal::is_lvalue<ExpressionType>::value, Scalar, const Scalar>
ScalarWithConstIfNotLvalue;
typedef typename internal::ref_selector<ExpressionType>::non_const_type NestedExpressionType;
typedef typename internal::ref_selector<ExpressionType>::non_const_type NestedExpressionType;
using Base::coeffRef;
using Base::coeffRef;
EIGEN_DEVICE_FUNC
explicit EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC constexpr explicit EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix)
: m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return m_expression.rows(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return m_expression.cols(); }
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
EIGEN_DEVICE_FUNC
inline const Scalar* data() const { return m_expression.data(); }
EIGEN_DEVICE_FUNC constexpr ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
EIGEN_DEVICE_FUNC constexpr const Scalar* data() const { return m_expression.data(); }
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index rowId, Index colId) const
{
return m_expression.coeffRef(rowId, colId);
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const {
return m_expression.coeffRef(rowId, colId);
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index index) const
{
return m_expression.coeffRef(index);
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { return m_expression.coeffRef(index); }
template<typename Dest>
EIGEN_DEVICE_FUNC
inline void evalTo(Dest& dst) const { dst = m_expression; }
template <typename Dest>
EIGEN_DEVICE_FUNC inline void evalTo(Dest& dst) const {
dst = m_expression;
}
const typename internal::remove_all<NestedExpressionType>::type&
EIGEN_DEVICE_FUNC
nestedExpression() const
{
return m_expression;
}
EIGEN_DEVICE_FUNC constexpr const internal::remove_all_t<NestedExpressionType>& nestedExpression() const {
return m_expression;
}
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index) */
EIGEN_DEVICE_FUNC
void resize(Index newSize) { m_expression.resize(newSize); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index,Index)*/
EIGEN_DEVICE_FUNC
void resize(Index rows, Index cols) { m_expression.resize(rows,cols); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index) */
EIGEN_DEVICE_FUNC void resize(Index newSize) { m_expression.resize(newSize); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index,Index)*/
EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { m_expression.resize(rows, cols); }
protected:
NestedExpressionType m_expression;
protected:
NestedExpressionType m_expression;
};
/** \class MatrixWrapper
* \ingroup Core_Module
*
* \brief Expression of an array as a mathematical vector or matrix
*
* This class is the return type of ArrayBase::matrix(), and most of the time
* this is the only way it is use.
*
* \sa MatrixBase::matrix(), class ArrayWrapper
*/
* \ingroup Core_Module
*
* \brief Expression of an array as a mathematical vector or matrix
*
* This class is the return type of ArrayBase::matrix(), and most of the time
* this is the only way it is used.
*
* \sa MatrixBase::matrix(), class ArrayWrapper
*/
namespace internal {
template<typename ExpressionType>
struct traits<MatrixWrapper<ExpressionType> >
: public traits<typename remove_all<typename ExpressionType::Nested>::type >
{
template <typename ExpressionType>
struct traits<MatrixWrapper<ExpressionType> > : public traits<remove_all_t<typename ExpressionType::Nested> > {
typedef MatrixXpr XprKind;
// Let's remove NestByRefBit
enum {
Flags0 = traits<typename remove_all<typename ExpressionType::Nested>::type >::Flags,
Flags = Flags0 & ~NestByRefBit
Flags0 = traits<remove_all_t<typename ExpressionType::Nested> >::Flags,
LvalueBitFlag = is_lvalue<ExpressionType>::value ? LvalueBit : 0,
Flags = (Flags0 & ~(NestByRefBit | LvalueBit)) | LvalueBitFlag
};
};
}
} // namespace internal
template<typename ExpressionType>
class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
{
public:
typedef MatrixBase<MatrixWrapper<ExpressionType> > Base;
EIGEN_DENSE_PUBLIC_INTERFACE(MatrixWrapper)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(MatrixWrapper)
typedef typename internal::remove_all<ExpressionType>::type NestedExpression;
template <typename ExpressionType>
class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> > {
public:
typedef MatrixBase<MatrixWrapper<ExpressionType> > Base;
EIGEN_DENSE_PUBLIC_INTERFACE(MatrixWrapper)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(MatrixWrapper)
typedef internal::remove_all_t<ExpressionType> NestedExpression;
typedef typename internal::conditional<
internal::is_lvalue<ExpressionType>::value,
Scalar,
const Scalar
>::type ScalarWithConstIfNotLvalue;
typedef std::conditional_t<internal::is_lvalue<ExpressionType>::value, Scalar, const Scalar>
ScalarWithConstIfNotLvalue;
typedef typename internal::ref_selector<ExpressionType>::non_const_type NestedExpressionType;
typedef typename internal::ref_selector<ExpressionType>::non_const_type NestedExpressionType;
using Base::coeffRef;
using Base::coeffRef;
EIGEN_DEVICE_FUNC
explicit inline MatrixWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC constexpr explicit inline MatrixWrapper(ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC
inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC
inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC
inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return m_expression.rows(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return m_expression.cols(); }
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
EIGEN_DEVICE_FUNC
inline const Scalar* data() const { return m_expression.data(); }
EIGEN_DEVICE_FUNC constexpr ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
EIGEN_DEVICE_FUNC constexpr const Scalar* data() const { return m_expression.data(); }
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index rowId, Index colId) const
{
return m_expression.derived().coeffRef(rowId, colId);
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const {
return m_expression.derived().coeffRef(rowId, colId);
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index index) const
{
return m_expression.coeffRef(index);
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const { return m_expression.coeffRef(index); }
EIGEN_DEVICE_FUNC
const typename internal::remove_all<NestedExpressionType>::type&
nestedExpression() const
{
return m_expression;
}
EIGEN_DEVICE_FUNC constexpr const internal::remove_all_t<NestedExpressionType>& nestedExpression() const {
return m_expression;
}
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index) */
EIGEN_DEVICE_FUNC
void resize(Index newSize) { m_expression.resize(newSize); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index,Index)*/
EIGEN_DEVICE_FUNC
void resize(Index rows, Index cols) { m_expression.resize(rows,cols); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index) */
EIGEN_DEVICE_FUNC void resize(Index newSize) { m_expression.resize(newSize); }
/** Forwards the resizing request to the nested expression
* \sa DenseBase::resize(Index,Index)*/
EIGEN_DEVICE_FUNC void resize(Index rows, Index cols) { m_expression.resize(rows, cols); }
protected:
NestedExpressionType m_expression;
protected:
NestedExpressionType m_expression;
};
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_ARRAYWRAPPER_H
#endif // EIGEN_ARRAYWRAPPER_H

View File

@@ -12,79 +12,73 @@
#ifndef EIGEN_ASSIGN_H
#define EIGEN_ASSIGN_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived& DenseBase<Derived>
::lazyAssign(const DenseBase<OtherDerived>& other)
{
enum{
SameType = internal::is_same<typename Derived::Scalar,typename OtherDerived::Scalar>::value
};
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::lazyAssign(
const DenseBase<OtherDerived>& other) {
enum { SameType = internal::is_same<typename Derived::Scalar, typename OtherDerived::Scalar>::value };
EIGEN_STATIC_ASSERT_LVALUE(Derived)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Derived,OtherDerived)
EIGEN_STATIC_ASSERT(SameType,YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Derived, OtherDerived)
EIGEN_STATIC_ASSERT(
SameType,
YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
eigen_assert(rows() == other.rows() && cols() == other.cols());
internal::call_assignment_no_alias(derived(),other.derived());
internal::call_assignment_no_alias(derived(), other.derived());
return derived();
}
template<typename Derived>
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(const DenseBase<OtherDerived>& other)
{
internal::call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(const DenseBase& other)
{
internal::call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const MatrixBase& other)
{
internal::call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const DenseBase<OtherDerived>& other)
{
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(
const DenseBase<OtherDerived>& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(const DenseBase& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const MatrixBase& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const EigenBase<OtherDerived>& other)
{
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(
const DenseBase<OtherDerived>& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const ReturnByValue<OtherDerived>& other)
{
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(
const EigenBase<OtherDerived>& other) {
internal::call_assignment(derived(), other.derived());
return derived();
}
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(
const ReturnByValue<OtherDerived>& other) {
other.derived().evalTo(derived());
return derived();
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_ASSIGN_H
#endif // EIGEN_ASSIGN_H

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,301 @@
/*
* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at https://mozilla.org/MPL/2.0/.
*
* Assign_AOCL.h - AOCL Vectorized Math Dispatch Layer for Eigen
*
* Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved.
*
* Description:
* ------------
* This file implements a high-performance dispatch layer that automatically
* routes Eigen's element-wise mathematical operations to AMD Optimizing CPU
* Libraries (AOCL) Vector Math Library (VML) functions when beneficial for
* performance.
*
* The dispatch system uses C++ template specialization to intercept Eigen's
* assignment operations and redirect them to AOCL's VRDA functions, which
* provide optimized implementations for AMD Zen architectures.
*
* Key Features:
* -------------
* 1. Automatic Dispatch: Seamlessly routes supported operations to AOCL without
* requiring code changes in user applications
*
* 2. Performance Optimization: Uses AOCL VRDA functions optimized for Zen
* family processors with automatic SIMD instruction selection (AVX2, AVX-512)
*
* 3. Threshold-Based Activation: Only activates for vectors larger than
* EIGEN_AOCL_VML_THRESHOLD (default: 128 elements) to avoid overhead on
* small vectors
*
* 4. Precision-Specific Handling:
* - Double precision: AOCL VRDA vectorized functions
* - Single precision: Scalar fallback (preserves correctness)
*
* 5. Memory Layout Compatibility: Ensures direct memory access and compatible
* storage orders between source and destination for optimal performance
*
* Supported Operations:
* ---------------------
* UNARY OPERATIONS (vector → vector):
* - Transcendental: exp(), sin(), cos(), sqrt(), log(), log10(), log2()
*
* BINARY OPERATIONS (vector op vector → vector):
* - Arithmetic: +, *, pow()
*
* Template Specialization Mechanism:
* -----------------------------------
* The system works by specializing Eigen's Assignment template for:
* 1. CwiseUnaryOp with scalar_*_op functors (unary operations)
* 2. CwiseBinaryOp with scalar_*_op functors (binary operations)
* 3. Dense2Dense assignment context with AOCL-compatible traits
*
* Dispatch conditions (all must be true):
* - Source and destination have DirectAccessBit (contiguous memory)
* - Compatible storage orders (both row-major or both column-major)
* - Vector size ≥ EIGEN_AOCL_VML_THRESHOLD or Dynamic size
* - Supported data type (currently double precision for VRDA)
*
* Integration Example:
* --------------------
* // Standard Eigen code - no changes required
* VectorXd x = VectorXd::Random(10000);
* VectorXd y = VectorXd::Random(10000);
* VectorXd result;
*
* // These operations are automatically dispatched to AOCL:
* result = x.array().exp(); // → amd_vrda_exp()
* result = x.array().sin(); // → amd_vrda_sin()
* result = x.array() + y.array(); // → amd_vrda_add()
* result = x.array().pow(y.array()); // → amd_vrda_pow()
*
* Configuration:
* --------------
* Required preprocessor definitions:
* - EIGEN_USE_AOCL_ALL or EIGEN_USE_AOCL_MT: Enable AOCL integration
* - EIGEN_USE_AOCL_VML: Enable Vector Math Library dispatch
*
* Compilation Requirements:
* -------------------------
* Include paths:
* - AOCL headers: -I${AOCL_ROOT}/include
* - Eigen headers: -I/path/to/eigen
*
* Link libraries:
* - AOCL MathLib: -lamdlibm
* - Standard math: -lm
*
* Compiler flags:
* - Optimization: -O3 (required for inlining)
* - Architecture: -march=znver5 or -march=native
* - Vectorization: -mfma -mavx512f (if supported)
*
* Platform Support:
* ------------------
* - Primary: Linux x86_64 with AMD Zen family processors
* - Compilers: GCC 8+, Clang 10+, AOCC (recommended)
* - AOCL Version: 4.0+ (with VRDA support)
*
* Error Handling:
* ---------------
* - Graceful fallback to scalar operations for unsupported configurations
* - Compile-time detection of AOCL availability
* - Runtime size and alignment validation with eigen_assert()
*
* Developer:
* ----------
* Name: Sharad Saurabh Bhaskar
* Email: shbhaska@amd.com
* Organization: Advanced Micro Devices, Inc.
*/
#ifndef EIGEN_ASSIGN_AOCL_H
#define EIGEN_ASSIGN_AOCL_H
namespace Eigen {
namespace internal {
// Traits for unary operations.
template <typename Dst, typename Src> class aocl_assign_traits {
private:
enum {
DstHasDirectAccess = !!(Dst::Flags & DirectAccessBit),
SrcHasDirectAccess = !!(Src::Flags & DirectAccessBit),
StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Src::IsRowMajor)),
InnerSize = Dst::IsVectorAtCompileTime ? int(Dst::SizeAtCompileTime)
: (Dst::Flags & RowMajorBit) ? int(Dst::ColsAtCompileTime)
: int(Dst::RowsAtCompileTime),
LargeEnough =
(InnerSize == Dynamic) || (InnerSize >= EIGEN_AOCL_VML_THRESHOLD)
};
public:
enum {
EnableAoclVML = DstHasDirectAccess && SrcHasDirectAccess &&
StorageOrdersAgree && LargeEnough,
Traversal = LinearTraversal
};
};
// Traits for binary operations (e.g., add, pow).
template <typename Dst, typename Lhs, typename Rhs>
class aocl_assign_binary_traits {
private:
enum {
DstHasDirectAccess = !!(Dst::Flags & DirectAccessBit),
LhsHasDirectAccess = !!(Lhs::Flags & DirectAccessBit),
RhsHasDirectAccess = !!(Rhs::Flags & DirectAccessBit),
StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Lhs::IsRowMajor)) &&
(int(Dst::IsRowMajor) == int(Rhs::IsRowMajor)),
InnerSize = Dst::IsVectorAtCompileTime ? int(Dst::SizeAtCompileTime)
: (Dst::Flags & RowMajorBit) ? int(Dst::ColsAtCompileTime)
: int(Dst::RowsAtCompileTime),
LargeEnough =
(InnerSize == Dynamic) || (InnerSize >= EIGEN_AOCL_VML_THRESHOLD)
};
public:
enum {
EnableAoclVML = DstHasDirectAccess && LhsHasDirectAccess &&
RhsHasDirectAccess && StorageOrdersAgree && LargeEnough
};
};
// Unary operation dispatch for float (scalar fallback).
#define EIGEN_AOCL_VML_UNARY_CALL_FLOAT(EIGENOP) \
template <typename DstXprType, typename SrcXprNested> \
struct Assignment< \
DstXprType, CwiseUnaryOp<scalar_##EIGENOP##_op<float>, SrcXprNested>, \
assign_op<float, float>, Dense2Dense, \
std::enable_if_t< \
aocl_assign_traits<DstXprType, SrcXprNested>::EnableAoclVML>> { \
typedef CwiseUnaryOp<scalar_##EIGENOP##_op<float>, SrcXprNested> \
SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, \
const assign_op<float, float> &) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
Eigen::Index n = dst.size(); \
if (n <= 0) \
return; \
const float *input = \
reinterpret_cast<const float *>(src.nestedExpression().data()); \
float *output = reinterpret_cast<float *>(dst.data()); \
for (Eigen::Index i = 0; i < n; ++i) { \
output[i] = std::EIGENOP(input[i]); \
} \
} \
};
// Unary operation dispatch for double (AOCL vectorized).
#define EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(EIGENOP, AOCLOP) \
template <typename DstXprType, typename SrcXprNested> \
struct Assignment< \
DstXprType, CwiseUnaryOp<scalar_##EIGENOP##_op<double>, SrcXprNested>, \
assign_op<double, double>, Dense2Dense, \
std::enable_if_t< \
aocl_assign_traits<DstXprType, SrcXprNested>::EnableAoclVML>> { \
typedef CwiseUnaryOp<scalar_##EIGENOP##_op<double>, SrcXprNested> \
SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, \
const assign_op<double, double> &) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
Eigen::Index n = dst.size(); \
eigen_assert(n <= INT_MAX && "AOCL does not support arrays larger than INT_MAX"); \
if (n <= 0) \
return; \
const double *input = \
reinterpret_cast<const double *>(src.nestedExpression().data()); \
double *output = reinterpret_cast<double *>(dst.data()); \
int aocl_n = internal::convert_index<int>(n); \
AOCLOP(aocl_n, const_cast<double *>(input), output); \
} \
};
// Instantiate unary calls for float (scalar).
// EIGEN_AOCL_VML_UNARY_CALL_FLOAT(exp)
// Instantiate unary calls for double (AOCL vectorized).
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(exp2, amd_vrda_exp2)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(exp, amd_vrda_exp)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(sin, amd_vrda_sin)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(cos, amd_vrda_cos)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(sqrt, amd_vrda_sqrt)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(cbrt, amd_vrda_cbrt)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(abs, amd_vrda_fabs)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(log, amd_vrda_log)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(log10, amd_vrda_log10)
EIGEN_AOCL_VML_UNARY_CALL_DOUBLE(log2, amd_vrda_log2)
// Binary operation dispatch for float (scalar fallback).
#define EIGEN_AOCL_VML_BINARY_CALL_FLOAT(EIGENOP, STDFUNC) \
template <typename DstXprType, typename LhsXprNested, typename RhsXprNested> \
struct Assignment< \
DstXprType, \
CwiseBinaryOp<scalar_##EIGENOP##_op<float, float>, LhsXprNested, \
RhsXprNested>, \
assign_op<float, float>, Dense2Dense, \
std::enable_if_t<aocl_assign_binary_traits< \
DstXprType, LhsXprNested, RhsXprNested>::EnableAoclVML>> { \
typedef CwiseBinaryOp<scalar_##EIGENOP##_op<float, float>, LhsXprNested, \
RhsXprNested> \
SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, \
const assign_op<float, float> &) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
Eigen::Index n = dst.size(); \
if (n <= 0) \
return; \
const float *lhs = reinterpret_cast<const float *>(src.lhs().data()); \
const float *rhs = reinterpret_cast<const float *>(src.rhs().data()); \
float *output = reinterpret_cast<float *>(dst.data()); \
for (Eigen::Index i = 0; i < n; ++i) { \
output[i] = STDFUNC(lhs[i], rhs[i]); \
} \
} \
};
// Binary operation dispatch for double (AOCL vectorized).
#define EIGEN_AOCL_VML_BINARY_CALL_DOUBLE(EIGENOP, AOCLOP) \
template <typename DstXprType, typename LhsXprNested, typename RhsXprNested> \
struct Assignment< \
DstXprType, \
CwiseBinaryOp<scalar_##EIGENOP##_op<double, double>, LhsXprNested, \
RhsXprNested>, \
assign_op<double, double>, Dense2Dense, \
std::enable_if_t<aocl_assign_binary_traits< \
DstXprType, LhsXprNested, RhsXprNested>::EnableAoclVML>> { \
typedef CwiseBinaryOp<scalar_##EIGENOP##_op<double, double>, LhsXprNested, \
RhsXprNested> \
SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, \
const assign_op<double, double> &) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
Eigen::Index n = dst.size(); \
eigen_assert(n <= INT_MAX && "AOCL does not support arrays larger than INT_MAX"); \
if (n <= 0) \
return; \
const double *lhs = reinterpret_cast<const double *>(src.lhs().data()); \
const double *rhs = reinterpret_cast<const double *>(src.rhs().data()); \
double *output = reinterpret_cast<double *>(dst.data()); \
int aocl_n = internal::convert_index<int>(n); \
AOCLOP(aocl_n, const_cast<double *>(lhs), const_cast<double *>(rhs), output); \
} \
};
// Instantiate binary calls for float (scalar).
// EIGEN_AOCL_VML_BINARY_CALL_FLOAT(sum, std::plus<float>) // Using
// scalar_sum_op for addition EIGEN_AOCL_VML_BINARY_CALL_FLOAT(pow, std::pow)
// Instantiate binary calls for double (AOCL vectorized).
EIGEN_AOCL_VML_BINARY_CALL_DOUBLE(sum, amd_vrda_add) // Using scalar_sum_op for addition
EIGEN_AOCL_VML_BINARY_CALL_DOUBLE(pow, amd_vrda_pow)
EIGEN_AOCL_VML_BINARY_CALL_DOUBLE(max, amd_vrda_fmax)
EIGEN_AOCL_VML_BINARY_CALL_DOUBLE(min, amd_vrda_fmin)
} // namespace internal
} // namespace Eigen
#endif // EIGEN_ASSIGN_AOCL_H

237
Eigen/src/Core/Assign_MKL.h Executable file → Normal file
View File

@@ -1,7 +1,7 @@
/*
Copyright (c) 2011, Intel Corporation. All rights reserved.
Copyright (C) 2015 Gael Guennebaud <gael.guennebaud@inria.fr>
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
@@ -34,143 +34,150 @@
#ifndef EIGEN_ASSIGN_VML_H
#define EIGEN_ASSIGN_VML_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename Dst, typename Src>
class vml_assign_traits
{
private:
enum {
DstHasDirectAccess = Dst::Flags & DirectAccessBit,
SrcHasDirectAccess = Src::Flags & DirectAccessBit,
StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Src::IsRowMajor)),
InnerSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::SizeAtCompileTime)
: int(Dst::Flags)&RowMajorBit ? int(Dst::ColsAtCompileTime)
: int(Dst::RowsAtCompileTime),
InnerMaxSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime)
: int(Dst::Flags)&RowMajorBit ? int(Dst::MaxColsAtCompileTime)
: int(Dst::MaxRowsAtCompileTime),
MaxSizeAtCompileTime = Dst::SizeAtCompileTime,
template <typename Dst, typename Src>
class vml_assign_traits {
private:
enum {
DstHasDirectAccess = Dst::Flags & DirectAccessBit,
SrcHasDirectAccess = Src::Flags & DirectAccessBit,
StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Src::IsRowMajor)),
InnerSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::SizeAtCompileTime)
: int(Dst::Flags) & RowMajorBit ? int(Dst::ColsAtCompileTime)
: int(Dst::RowsAtCompileTime),
InnerMaxSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime)
: int(Dst::Flags) & RowMajorBit ? int(Dst::MaxColsAtCompileTime)
: int(Dst::MaxRowsAtCompileTime),
MaxSizeAtCompileTime = Dst::SizeAtCompileTime,
MightEnableVml = StorageOrdersAgree && DstHasDirectAccess && SrcHasDirectAccess && Src::InnerStrideAtCompileTime==1 && Dst::InnerStrideAtCompileTime==1,
MightLinearize = MightEnableVml && (int(Dst::Flags) & int(Src::Flags) & LinearAccessBit),
VmlSize = MightLinearize ? MaxSizeAtCompileTime : InnerMaxSize,
LargeEnough = VmlSize==Dynamic || VmlSize>=EIGEN_MKL_VML_THRESHOLD
};
public:
enum {
EnableVml = MightEnableVml && LargeEnough,
Traversal = MightLinearize ? LinearTraversal : DefaultTraversal
};
MightEnableVml = bool(StorageOrdersAgree) && bool(DstHasDirectAccess) && bool(SrcHasDirectAccess) &&
Src::InnerStrideAtCompileTime == 1 && Dst::InnerStrideAtCompileTime == 1,
MightLinearize = bool(MightEnableVml) && (int(Dst::Flags) & int(Src::Flags) & LinearAccessBit),
VmlSize = bool(MightLinearize) ? MaxSizeAtCompileTime : InnerMaxSize,
LargeEnough = (VmlSize == Dynamic) || VmlSize >= EIGEN_MKL_VML_THRESHOLD
};
public:
enum { EnableVml = MightEnableVml && LargeEnough, Traversal = MightLinearize ? LinearTraversal : DefaultTraversal };
};
#define EIGEN_PP_EXPAND(ARG) ARG
#if !defined (EIGEN_FAST_MATH) || (EIGEN_FAST_MATH != 1)
#define EIGEN_VMLMODE_EXPAND_LA , VML_HA
#if !defined(EIGEN_FAST_MATH) || (EIGEN_FAST_MATH != 1)
#define EIGEN_VMLMODE_EXPAND_xLA , VML_HA
#else
#define EIGEN_VMLMODE_EXPAND_LA , VML_LA
#define EIGEN_VMLMODE_EXPAND_xLA , VML_LA
#endif
#define EIGEN_VMLMODE_EXPAND__
#define EIGEN_VMLMODE_EXPAND_x_
#define EIGEN_VMLMODE_PREFIX_LA vm
#define EIGEN_VMLMODE_PREFIX__ v
#define EIGEN_VMLMODE_PREFIX(VMLMODE) EIGEN_CAT(EIGEN_VMLMODE_PREFIX_,VMLMODE)
#define EIGEN_VMLMODE_PREFIX_xLA vm
#define EIGEN_VMLMODE_PREFIX_x_ v
#define EIGEN_VMLMODE_PREFIX(VMLMODE) EIGEN_CAT(EIGEN_VMLMODE_PREFIX_x, VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \
template< typename DstXprType, typename SrcXprNested> \
struct Assignment<DstXprType, CwiseUnaryOp<scalar_##EIGENOP##_op<EIGENTYPE>, SrcXprNested>, assign_op<EIGENTYPE,EIGENTYPE>, \
Dense2Dense, typename enable_if<vml_assign_traits<DstXprType,SrcXprNested>::EnableVml>::type> { \
typedef CwiseUnaryOp<scalar_##EIGENOP##_op<EIGENTYPE>, SrcXprNested> SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, const assign_op<EIGENTYPE,EIGENTYPE> &/*func*/) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
if(vml_assign_traits<DstXprType,SrcXprNested>::Traversal==LinearTraversal) { \
VMLOP(dst.size(), (const VMLTYPE*)src.nestedExpression().data(), \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE) ); \
} else { \
const Index outerSize = dst.outerSize(); \
for(Index outer = 0; outer < outerSize; ++outer) { \
const EIGENTYPE *src_ptr = src.IsRowMajor ? &(src.nestedExpression().coeffRef(outer,0)) : \
&(src.nestedExpression().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \
VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE)); \
} \
} \
} \
}; \
#define EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \
template <typename DstXprType, typename SrcXprNested> \
struct Assignment<DstXprType, CwiseUnaryOp<scalar_##EIGENOP##_op<EIGENTYPE>, SrcXprNested>, \
assign_op<EIGENTYPE, EIGENTYPE>, Dense2Dense, \
std::enable_if_t<vml_assign_traits<DstXprType, SrcXprNested>::EnableVml>> { \
typedef CwiseUnaryOp<scalar_##EIGENOP##_op<EIGENTYPE>, SrcXprNested> SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, const assign_op<EIGENTYPE, EIGENTYPE> &func) { \
resize_if_allowed(dst, src, func); \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
if (vml_assign_traits<DstXprType, SrcXprNested>::Traversal == (int)LinearTraversal) { \
VMLOP(dst.size(), (const VMLTYPE *)src.nestedExpression().data(), \
(VMLTYPE *)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} else { \
const Index outerSize = dst.outerSize(); \
for (Index outer = 0; outer < outerSize; ++outer) { \
const EIGENTYPE *src_ptr = src.IsRowMajor ? &(src.nestedExpression().coeffRef(outer, 0)) \
: &(src.nestedExpression().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer, 0)) : &(dst.coeffRef(0, outer)); \
VMLOP(dst.innerSize(), (const VMLTYPE *)src_ptr, \
(VMLTYPE *)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} \
} \
} \
};
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE), s##VMLOP), float, float, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE), d##VMLOP), double, double, VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),s##VMLOP), float, float, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),d##VMLOP), double, double, VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE), c##VMLOP), scomplex, \
MKL_Complex8, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE), z##VMLOP), dcomplex, \
MKL_Complex16, VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),c##VMLOP), scomplex, MKL_Complex8, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, EIGEN_CAT(EIGEN_VMLMODE_PREFIX(VMLMODE),z##VMLOP), dcomplex, MKL_Complex16, VMLMODE)
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \
#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP, VMLMODE) \
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(EIGENOP, VMLOP, VMLMODE)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sin, Sin, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(asin, Asin, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sinh, Sinh, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cos, Cos, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(acos, Acos, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cosh, Cosh, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tan, Tan, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(atan, Atan, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tanh, Tanh, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sin, Sin, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(asin, Asin, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sinh, Sinh, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cos, Cos, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(acos, Acos, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(cosh, Cosh, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tan, Tan, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(atan, Atan, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(tanh, Tanh, LA)
// EIGEN_MKL_VML_DECLARE_UNARY_CALLS(abs, Abs, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(exp, Exp, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(log, Ln, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(exp, Exp, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(log, Ln, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(log10, Log10, LA)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sqrt, Sqrt, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS(sqrt, Sqrt, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(square, Sqr, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(arg, Arg, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(round, Round, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(floor, Floor, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(ceil, Ceil, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(square, Sqr, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_CPLX(arg, Arg, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(round, Round, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(floor, Floor, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(ceil, Ceil, _)
EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(cbrt, Cbrt, _)
#define EIGEN_MKL_VML_DECLARE_POW_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \
template< typename DstXprType, typename SrcXprNested, typename Plain> \
struct Assignment<DstXprType, CwiseBinaryOp<scalar_##EIGENOP##_op<EIGENTYPE,EIGENTYPE>, SrcXprNested, \
const CwiseNullaryOp<internal::scalar_constant_op<EIGENTYPE>,Plain> >, assign_op<EIGENTYPE,EIGENTYPE>, \
Dense2Dense, typename enable_if<vml_assign_traits<DstXprType,SrcXprNested>::EnableVml>::type> { \
typedef CwiseBinaryOp<scalar_##EIGENOP##_op<EIGENTYPE,EIGENTYPE>, SrcXprNested, \
const CwiseNullaryOp<internal::scalar_constant_op<EIGENTYPE>,Plain> > SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, const assign_op<EIGENTYPE,EIGENTYPE> &/*func*/) { \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
VMLTYPE exponent = reinterpret_cast<const VMLTYPE&>(src.rhs().functor().m_other); \
if(vml_assign_traits<DstXprType,SrcXprNested>::Traversal==LinearTraversal) \
{ \
VMLOP( dst.size(), (const VMLTYPE*)src.lhs().data(), exponent, \
(VMLTYPE*)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE) ); \
} else { \
const Index outerSize = dst.outerSize(); \
for(Index outer = 0; outer < outerSize; ++outer) { \
const EIGENTYPE *src_ptr = src.IsRowMajor ? &(src.lhs().coeffRef(outer,0)) : \
&(src.lhs().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer)); \
VMLOP( dst.innerSize(), (const VMLTYPE*)src_ptr, exponent, \
(VMLTYPE*)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_##VMLMODE)); \
} \
} \
} \
#define EIGEN_MKL_VML_DECLARE_POW_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE, VMLMODE) \
template <typename DstXprType, typename SrcXprNested, typename Plain> \
struct Assignment<DstXprType, \
CwiseBinaryOp<scalar_##EIGENOP##_op<EIGENTYPE, EIGENTYPE>, SrcXprNested, \
const CwiseNullaryOp<internal::scalar_constant_op<EIGENTYPE>, Plain>>, \
assign_op<EIGENTYPE, EIGENTYPE>, Dense2Dense, \
std::enable_if_t<vml_assign_traits<DstXprType, SrcXprNested>::EnableVml>> { \
typedef CwiseBinaryOp<scalar_##EIGENOP##_op<EIGENTYPE, EIGENTYPE>, SrcXprNested, \
const CwiseNullaryOp<internal::scalar_constant_op<EIGENTYPE>, Plain>> \
SrcXprType; \
static void run(DstXprType &dst, const SrcXprType &src, const assign_op<EIGENTYPE, EIGENTYPE> &func) { \
resize_if_allowed(dst, src, func); \
eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols()); \
VMLTYPE exponent = reinterpret_cast<const VMLTYPE &>(src.rhs().functor().m_other); \
if (vml_assign_traits<DstXprType, SrcXprNested>::Traversal == LinearTraversal) { \
VMLOP(dst.size(), (const VMLTYPE *)src.lhs().data(), exponent, \
(VMLTYPE *)dst.data() EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} else { \
const Index outerSize = dst.outerSize(); \
for (Index outer = 0; outer < outerSize; ++outer) { \
const EIGENTYPE *src_ptr = \
src.IsRowMajor ? &(src.lhs().coeffRef(outer, 0)) : &(src.lhs().coeffRef(0, outer)); \
EIGENTYPE *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer, 0)) : &(dst.coeffRef(0, outer)); \
VMLOP(dst.innerSize(), (const VMLTYPE *)src_ptr, exponent, \
(VMLTYPE *)dst_ptr EIGEN_PP_EXPAND(EIGEN_VMLMODE_EXPAND_x##VMLMODE)); \
} \
} \
} \
};
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmsPowx, float, float, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmdPowx, double, double, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmcPowx, scomplex, MKL_Complex8, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmsPowx, float, float, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmdPowx, double, double, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmcPowx, scomplex, MKL_Complex8, LA)
EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmzPowx, dcomplex, MKL_Complex16, LA)
} // end namespace internal
} // end namespace internal
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_ASSIGN_VML_H
#endif // EIGEN_ASSIGN_VML_H

View File

@@ -10,344 +10,329 @@
#ifndef EIGEN_BANDMATRIX_H
#define EIGEN_BANDMATRIX_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename Derived>
class BandMatrixBase : public EigenBase<Derived>
{
public:
template <typename Derived>
class BandMatrixBase : public EigenBase<Derived> {
public:
enum {
Flags = internal::traits<Derived>::Flags,
CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
Supers = internal::traits<Derived>::Supers,
Subs = internal::traits<Derived>::Subs,
Options = internal::traits<Derived>::Options
};
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef Matrix<Scalar, RowsAtCompileTime, ColsAtCompileTime> DenseMatrixType;
typedef typename DenseMatrixType::StorageIndex StorageIndex;
typedef typename internal::traits<Derived>::CoefficientsType CoefficientsType;
typedef EigenBase<Derived> Base;
protected:
enum {
DataRowsAtCompileTime = ((Supers != Dynamic) && (Subs != Dynamic)) ? 1 + Supers + Subs : Dynamic,
SizeAtCompileTime = min_size_prefer_dynamic(RowsAtCompileTime, ColsAtCompileTime)
};
public:
using Base::cols;
using Base::derived;
using Base::rows;
/** \returns the number of super diagonals */
inline Index supers() const { return derived().supers(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return derived().subs(); }
/** \returns an expression of the underlying coefficient matrix */
inline const CoefficientsType& coeffs() const { return derived().coeffs(); }
/** \returns an expression of the underlying coefficient matrix */
inline CoefficientsType& coeffs() { return derived().coeffs(); }
/** \returns a vector expression of the \a i -th column,
* only the meaningful part is returned.
* \warning the internal storage must be column major. */
inline Block<CoefficientsType, Dynamic, 1> col(Index i) {
EIGEN_STATIC_ASSERT((int(Options) & int(RowMajor)) == 0, THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
Index start = 0;
Index len = coeffs().rows();
if (i <= supers()) {
start = supers() - i;
len = (std::min)(rows(), std::max<Index>(0, coeffs().rows() - (supers() - i)));
} else if (i >= rows() - subs())
len = std::max<Index>(0, coeffs().rows() - (i + 1 - rows() + subs()));
return Block<CoefficientsType, Dynamic, 1>(coeffs(), start, i, len, 1);
}
/** \returns a vector expression of the main diagonal */
inline Block<CoefficientsType, 1, SizeAtCompileTime> diagonal() {
return Block<CoefficientsType, 1, SizeAtCompileTime>(coeffs(), supers(), 0, 1, (std::min)(rows(), cols()));
}
/** \returns a vector expression of the main diagonal (const version) */
inline const Block<const CoefficientsType, 1, SizeAtCompileTime> diagonal() const {
return Block<const CoefficientsType, 1, SizeAtCompileTime>(coeffs(), supers(), 0, 1, (std::min)(rows(), cols()));
}
template <int Index>
struct DiagonalIntReturnType {
enum {
Flags = internal::traits<Derived>::Flags,
CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
Supers = internal::traits<Derived>::Supers,
Subs = internal::traits<Derived>::Subs,
Options = internal::traits<Derived>::Options
ReturnOpposite =
(int(Options) & int(SelfAdjoint)) && (((Index) > 0 && Supers == 0) || ((Index) < 0 && Subs == 0)),
Conjugate = ReturnOpposite && NumTraits<Scalar>::IsComplex,
ActualIndex = ReturnOpposite ? -Index : Index,
DiagonalSize =
(RowsAtCompileTime == Dynamic || ColsAtCompileTime == Dynamic)
? Dynamic
: (ActualIndex < 0 ? min_size_prefer_dynamic(ColsAtCompileTime, RowsAtCompileTime + ActualIndex)
: min_size_prefer_dynamic(RowsAtCompileTime, ColsAtCompileTime - ActualIndex))
};
typedef typename internal::traits<Derived>::Scalar Scalar;
typedef Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime> DenseMatrixType;
typedef typename DenseMatrixType::StorageIndex StorageIndex;
typedef typename internal::traits<Derived>::CoefficientsType CoefficientsType;
typedef EigenBase<Derived> Base;
typedef Block<CoefficientsType, 1, DiagonalSize> BuildType;
typedef std::conditional_t<Conjugate, CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, BuildType>, BuildType>
Type;
};
protected:
enum {
DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic))
? 1 + Supers + Subs
: Dynamic,
SizeAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime,ColsAtCompileTime)
};
/** \returns a vector expression of the \a N -th sub or super diagonal */
template <int N>
inline typename DiagonalIntReturnType<N>::Type diagonal() {
return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers() - N, (std::max)(0, N), 1, diagonalLength(N));
}
public:
using Base::derived;
using Base::rows;
using Base::cols;
/** \returns a vector expression of the \a N -th sub or super diagonal */
template <int N>
inline const typename DiagonalIntReturnType<N>::Type diagonal() const {
return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers() - N, (std::max)(0, N), 1, diagonalLength(N));
}
/** \returns the number of super diagonals */
inline Index supers() const { return derived().supers(); }
/** \returns a vector expression of the \a i -th sub or super diagonal */
inline Block<CoefficientsType, 1, Dynamic> diagonal(Index i) {
eigen_assert((i < 0 && -i <= subs()) || (i >= 0 && i <= supers()));
return Block<CoefficientsType, 1, Dynamic>(coeffs(), supers() - i, std::max<Index>(0, i), 1, diagonalLength(i));
}
/** \returns the number of sub diagonals */
inline Index subs() const { return derived().subs(); }
/** \returns an expression of the underlying coefficient matrix */
inline const CoefficientsType& coeffs() const { return derived().coeffs(); }
/** \returns an expression of the underlying coefficient matrix */
inline CoefficientsType& coeffs() { return derived().coeffs(); }
/** \returns a vector expression of the \a i -th sub or super diagonal */
inline const Block<const CoefficientsType, 1, Dynamic> diagonal(Index i) const {
eigen_assert((i < 0 && -i <= subs()) || (i >= 0 && i <= supers()));
return Block<const CoefficientsType, 1, Dynamic>(coeffs(), supers() - i, std::max<Index>(0, i), 1,
diagonalLength(i));
}
/** \returns a vector expression of the \a i -th column,
* only the meaningful part is returned.
* \warning the internal storage must be column major. */
inline Block<CoefficientsType,Dynamic,1> col(Index i)
{
EIGEN_STATIC_ASSERT((Options&RowMajor)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
Index start = 0;
Index len = coeffs().rows();
if (i<=supers())
{
start = supers()-i;
len = (std::min)(rows(),std::max<Index>(0,coeffs().rows() - (supers()-i)));
}
else if (i>=rows()-subs())
len = std::max<Index>(0,coeffs().rows() - (i + 1 - rows() + subs()));
return Block<CoefficientsType,Dynamic,1>(coeffs(), start, i, len, 1);
}
template <typename Dest>
inline void evalTo(Dest& dst) const {
dst.resize(rows(), cols());
dst.setZero();
dst.diagonal() = diagonal();
for (Index i = 1; i <= supers(); ++i) dst.diagonal(i) = diagonal(i);
for (Index i = 1; i <= subs(); ++i) dst.diagonal(-i) = diagonal(-i);
}
/** \returns a vector expression of the main diagonal */
inline Block<CoefficientsType,1,SizeAtCompileTime> diagonal()
{ return Block<CoefficientsType,1,SizeAtCompileTime>(coeffs(),supers(),0,1,(std::min)(rows(),cols())); }
DenseMatrixType toDenseMatrix() const {
DenseMatrixType res(rows(), cols());
evalTo(res);
return res;
}
/** \returns a vector expression of the main diagonal (const version) */
inline const Block<const CoefficientsType,1,SizeAtCompileTime> diagonal() const
{ return Block<const CoefficientsType,1,SizeAtCompileTime>(coeffs(),supers(),0,1,(std::min)(rows(),cols())); }
template<int Index> struct DiagonalIntReturnType {
enum {
ReturnOpposite = (Options&SelfAdjoint) && (((Index)>0 && Supers==0) || ((Index)<0 && Subs==0)),
Conjugate = ReturnOpposite && NumTraits<Scalar>::IsComplex,
ActualIndex = ReturnOpposite ? -Index : Index,
DiagonalSize = (RowsAtCompileTime==Dynamic || ColsAtCompileTime==Dynamic)
? Dynamic
: (ActualIndex<0
? EIGEN_SIZE_MIN_PREFER_DYNAMIC(ColsAtCompileTime, RowsAtCompileTime + ActualIndex)
: EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime, ColsAtCompileTime - ActualIndex))
};
typedef Block<CoefficientsType,1, DiagonalSize> BuildType;
typedef typename internal::conditional<Conjugate,
CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>,BuildType >,
BuildType>::type Type;
};
/** \returns a vector expression of the \a N -th sub or super diagonal */
template<int N> inline typename DiagonalIntReturnType<N>::Type diagonal()
{
return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N));
}
/** \returns a vector expression of the \a N -th sub or super diagonal */
template<int N> inline const typename DiagonalIntReturnType<N>::Type diagonal() const
{
return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N));
}
/** \returns a vector expression of the \a i -th sub or super diagonal */
inline Block<CoefficientsType,1,Dynamic> diagonal(Index i)
{
eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
return Block<CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
}
/** \returns a vector expression of the \a i -th sub or super diagonal */
inline const Block<const CoefficientsType,1,Dynamic> diagonal(Index i) const
{
eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
return Block<const CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
}
template<typename Dest> inline void evalTo(Dest& dst) const
{
dst.resize(rows(),cols());
dst.setZero();
dst.diagonal() = diagonal();
for (Index i=1; i<=supers();++i)
dst.diagonal(i) = diagonal(i);
for (Index i=1; i<=subs();++i)
dst.diagonal(-i) = diagonal(-i);
}
DenseMatrixType toDenseMatrix() const
{
DenseMatrixType res(rows(),cols());
evalTo(res);
return res;
}
protected:
inline Index diagonalLength(Index i) const
{ return i<0 ? (std::min)(cols(),rows()+i) : (std::min)(rows(),cols()-i); }
protected:
inline Index diagonalLength(Index i) const {
return i < 0 ? (std::min)(cols(), rows() + i) : (std::min)(rows(), cols() - i);
}
};
/**
* \class BandMatrix
* \ingroup Core_Module
*
* \brief Represents a rectangular matrix with a banded storage
*
* \tparam _Scalar Numeric type, i.e. float, double, int
* \tparam _Rows Number of rows, or \b Dynamic
* \tparam _Cols Number of columns, or \b Dynamic
* \tparam _Supers Number of super diagonal
* \tparam _Subs Number of sub diagonal
* \tparam _Options A combination of either \b #RowMajor or \b #ColMajor, and of \b #SelfAdjoint
* The former controls \ref TopicStorageOrders "storage order", and defaults to
* column-major. The latter controls whether the matrix represents a selfadjoint
* matrix in which case either Supers of Subs have to be null.
*
* \sa class TridiagonalMatrix
*/
* \class BandMatrix
* \ingroup Core_Module
*
* \brief Represents a rectangular matrix with a banded storage
*
* \tparam Scalar_ Numeric type, i.e. float, double, int
* \tparam Rows_ Number of rows, or \b Dynamic
* \tparam Cols_ Number of columns, or \b Dynamic
* \tparam Supers_ Number of super diagonal
* \tparam Subs_ Number of sub diagonal
* \tparam Options_ A combination of either \b #RowMajor or \b #ColMajor, and of \b #SelfAdjoint
* The former controls \ref TopicStorageOrders "storage order", and defaults to
* column-major. The latter controls whether the matrix represents a selfadjoint
* matrix in which case either Supers of Subs have to be null.
*
* \sa class TridiagonalMatrix
*/
template<typename _Scalar, int _Rows, int _Cols, int _Supers, int _Subs, int _Options>
struct traits<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
{
typedef _Scalar Scalar;
template <typename Scalar_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
struct traits<BandMatrix<Scalar_, Rows_, Cols_, Supers_, Subs_, Options_> > {
typedef Scalar_ Scalar;
typedef Dense StorageKind;
typedef Eigen::Index StorageIndex;
enum {
CoeffReadCost = NumTraits<Scalar>::ReadCost,
RowsAtCompileTime = _Rows,
ColsAtCompileTime = _Cols,
MaxRowsAtCompileTime = _Rows,
MaxColsAtCompileTime = _Cols,
RowsAtCompileTime = Rows_,
ColsAtCompileTime = Cols_,
MaxRowsAtCompileTime = Rows_,
MaxColsAtCompileTime = Cols_,
Flags = LvalueBit,
Supers = _Supers,
Subs = _Subs,
Options = _Options,
DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
Supers = Supers_,
Subs = Subs_,
Options = Options_,
DataRowsAtCompileTime = ((Supers != Dynamic) && (Subs != Dynamic)) ? 1 + Supers + Subs : Dynamic
};
typedef Matrix<Scalar,DataRowsAtCompileTime,ColsAtCompileTime,Options&RowMajor?RowMajor:ColMajor> CoefficientsType;
typedef Matrix<Scalar, DataRowsAtCompileTime, ColsAtCompileTime, int(Options) & int(RowMajor) ? RowMajor : ColMajor>
CoefficientsType;
};
template<typename _Scalar, int Rows, int Cols, int Supers, int Subs, int Options>
class BandMatrix : public BandMatrixBase<BandMatrix<_Scalar,Rows,Cols,Supers,Subs,Options> >
{
public:
template <typename Scalar_, int Rows, int Cols, int Supers, int Subs, int Options>
class BandMatrix : public BandMatrixBase<BandMatrix<Scalar_, Rows, Cols, Supers, Subs, Options> > {
public:
typedef typename internal::traits<BandMatrix>::Scalar Scalar;
typedef typename internal::traits<BandMatrix>::StorageIndex StorageIndex;
typedef typename internal::traits<BandMatrix>::CoefficientsType CoefficientsType;
typedef typename internal::traits<BandMatrix>::Scalar Scalar;
typedef typename internal::traits<BandMatrix>::StorageIndex StorageIndex;
typedef typename internal::traits<BandMatrix>::CoefficientsType CoefficientsType;
explicit inline BandMatrix(Index rows = Rows, Index cols = Cols, Index supers = Supers, Index subs = Subs)
: m_coeffs(1 + supers + subs, cols), m_rows(rows), m_supers(supers), m_subs(subs) {}
explicit inline BandMatrix(Index rows=Rows, Index cols=Cols, Index supers=Supers, Index subs=Subs)
: m_coeffs(1+supers+subs,cols),
m_rows(rows), m_supers(supers), m_subs(subs)
{
}
/** \returns the number of columns */
constexpr Index rows() const { return m_rows.value(); }
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
constexpr Index cols() const { return m_coeffs.cols(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
constexpr Index supers() const { return m_supers.value(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
constexpr Index subs() const { return m_subs.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
inline CoefficientsType& coeffs() { return m_coeffs; }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
inline CoefficientsType& coeffs() { return m_coeffs; }
protected:
CoefficientsType m_coeffs;
internal::variable_if_dynamic<Index, Rows> m_rows;
internal::variable_if_dynamic<Index, Supers> m_supers;
internal::variable_if_dynamic<Index, Subs> m_subs;
protected:
CoefficientsType m_coeffs;
internal::variable_if_dynamic<Index, Rows> m_rows;
internal::variable_if_dynamic<Index, Supers> m_supers;
internal::variable_if_dynamic<Index, Subs> m_subs;
};
template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
template <typename CoefficientsType_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
class BandMatrixWrapper;
template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
struct traits<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
{
typedef typename _CoefficientsType::Scalar Scalar;
typedef typename _CoefficientsType::StorageKind StorageKind;
typedef typename _CoefficientsType::StorageIndex StorageIndex;
template <typename CoefficientsType_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
struct traits<BandMatrixWrapper<CoefficientsType_, Rows_, Cols_, Supers_, Subs_, Options_> > {
typedef typename CoefficientsType_::Scalar Scalar;
typedef typename CoefficientsType_::StorageKind StorageKind;
typedef typename CoefficientsType_::StorageIndex StorageIndex;
enum {
CoeffReadCost = internal::traits<_CoefficientsType>::CoeffReadCost,
RowsAtCompileTime = _Rows,
ColsAtCompileTime = _Cols,
MaxRowsAtCompileTime = _Rows,
MaxColsAtCompileTime = _Cols,
CoeffReadCost = internal::traits<CoefficientsType_>::CoeffReadCost,
RowsAtCompileTime = Rows_,
ColsAtCompileTime = Cols_,
MaxRowsAtCompileTime = Rows_,
MaxColsAtCompileTime = Cols_,
Flags = LvalueBit,
Supers = _Supers,
Subs = _Subs,
Options = _Options,
DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
Supers = Supers_,
Subs = Subs_,
Options = Options_,
DataRowsAtCompileTime = ((Supers != Dynamic) && (Subs != Dynamic)) ? 1 + Supers + Subs : Dynamic
};
typedef _CoefficientsType CoefficientsType;
typedef CoefficientsType_ CoefficientsType;
};
template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
class BandMatrixWrapper : public BandMatrixBase<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
{
public:
template <typename CoefficientsType_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
class BandMatrixWrapper
: public BandMatrixBase<BandMatrixWrapper<CoefficientsType_, Rows_, Cols_, Supers_, Subs_, Options_> > {
public:
typedef typename internal::traits<BandMatrixWrapper>::Scalar Scalar;
typedef typename internal::traits<BandMatrixWrapper>::CoefficientsType CoefficientsType;
typedef typename internal::traits<BandMatrixWrapper>::StorageIndex StorageIndex;
typedef typename internal::traits<BandMatrixWrapper>::Scalar Scalar;
typedef typename internal::traits<BandMatrixWrapper>::CoefficientsType CoefficientsType;
typedef typename internal::traits<BandMatrixWrapper>::StorageIndex StorageIndex;
explicit inline BandMatrixWrapper(const CoefficientsType& coeffs, Index rows = Rows_, Index cols = Cols_,
Index supers = Supers_, Index subs = Subs_)
: m_coeffs(coeffs), m_rows(rows), m_supers(supers), m_subs(subs) {
EIGEN_UNUSED_VARIABLE(cols);
// eigen_assert(coeffs.cols()==cols() && (supers()+subs()+1)==coeffs.rows());
}
explicit inline BandMatrixWrapper(const CoefficientsType& coeffs, Index rows=_Rows, Index cols=_Cols, Index supers=_Supers, Index subs=_Subs)
: m_coeffs(coeffs),
m_rows(rows), m_supers(supers), m_subs(subs)
{
EIGEN_UNUSED_VARIABLE(cols);
//internal::assert(coeffs.cols()==cols() && (supers()+subs()+1)==coeffs.rows());
}
/** \returns the number of columns */
constexpr Index rows() const { return m_rows.value(); }
/** \returns the number of columns */
inline Index rows() const { return m_rows.value(); }
/** \returns the number of rows */
constexpr Index cols() const { return m_coeffs.cols(); }
/** \returns the number of rows */
inline Index cols() const { return m_coeffs.cols(); }
/** \returns the number of super diagonals */
constexpr Index supers() const { return m_supers.value(); }
/** \returns the number of super diagonals */
inline Index supers() const { return m_supers.value(); }
/** \returns the number of sub diagonals */
constexpr Index subs() const { return m_subs.value(); }
/** \returns the number of sub diagonals */
inline Index subs() const { return m_subs.value(); }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
inline const CoefficientsType& coeffs() const { return m_coeffs; }
protected:
const CoefficientsType& m_coeffs;
internal::variable_if_dynamic<Index, _Rows> m_rows;
internal::variable_if_dynamic<Index, _Supers> m_supers;
internal::variable_if_dynamic<Index, _Subs> m_subs;
protected:
const CoefficientsType& m_coeffs;
internal::variable_if_dynamic<Index, Rows_> m_rows;
internal::variable_if_dynamic<Index, Supers_> m_supers;
internal::variable_if_dynamic<Index, Subs_> m_subs;
};
/**
* \class TridiagonalMatrix
* \ingroup Core_Module
*
* \brief Represents a tridiagonal matrix with a compact banded storage
*
* \tparam Scalar Numeric type, i.e. float, double, int
* \tparam Size Number of rows and cols, or \b Dynamic
* \tparam Options Can be 0 or \b SelfAdjoint
*
* \sa class BandMatrix
*/
template<typename Scalar, int Size, int Options>
class TridiagonalMatrix : public BandMatrix<Scalar,Size,Size,Options&SelfAdjoint?0:1,1,Options|RowMajor>
{
typedef BandMatrix<Scalar,Size,Size,Options&SelfAdjoint?0:1,1,Options|RowMajor> Base;
typedef typename Base::StorageIndex StorageIndex;
public:
explicit TridiagonalMatrix(Index size = Size) : Base(size,size,Options&SelfAdjoint?0:1,1) {}
* \class TridiagonalMatrix
* \ingroup Core_Module
*
* \brief Represents a tridiagonal matrix with a compact banded storage
*
* \tparam Scalar Numeric type, i.e. float, double, int
* \tparam Size Number of rows and cols, or \b Dynamic
* \tparam Options Can be 0 or \b SelfAdjoint
*
* \sa class BandMatrix
*/
template <typename Scalar, int Size, int Options>
class TridiagonalMatrix : public BandMatrix<Scalar, Size, Size, Options & SelfAdjoint ? 0 : 1, 1, Options | RowMajor> {
typedef BandMatrix<Scalar, Size, Size, Options & SelfAdjoint ? 0 : 1, 1, Options | RowMajor> Base;
typedef typename Base::StorageIndex StorageIndex;
inline typename Base::template DiagonalIntReturnType<1>::Type super()
{ return Base::template diagonal<1>(); }
inline const typename Base::template DiagonalIntReturnType<1>::Type super() const
{ return Base::template diagonal<1>(); }
inline typename Base::template DiagonalIntReturnType<-1>::Type sub()
{ return Base::template diagonal<-1>(); }
inline const typename Base::template DiagonalIntReturnType<-1>::Type sub() const
{ return Base::template diagonal<-1>(); }
protected:
public:
explicit TridiagonalMatrix(Index size = Size) : Base(size, size, Options & SelfAdjoint ? 0 : 1, 1) {}
inline typename Base::template DiagonalIntReturnType<1>::Type super() { return Base::template diagonal<1>(); }
inline const typename Base::template DiagonalIntReturnType<1>::Type super() const {
return Base::template diagonal<1>();
}
inline typename Base::template DiagonalIntReturnType<-1>::Type sub() { return Base::template diagonal<-1>(); }
inline const typename Base::template DiagonalIntReturnType<-1>::Type sub() const {
return Base::template diagonal<-1>();
}
protected:
};
struct BandShape {};
template<typename _Scalar, int _Rows, int _Cols, int _Supers, int _Subs, int _Options>
struct evaluator_traits<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
: public evaluator_traits_base<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
{
template <typename Scalar_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
struct evaluator_traits<BandMatrix<Scalar_, Rows_, Cols_, Supers_, Subs_, Options_> >
: public evaluator_traits_base<BandMatrix<Scalar_, Rows_, Cols_, Supers_, Subs_, Options_> > {
typedef BandShape Shape;
};
template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
struct evaluator_traits<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
: public evaluator_traits_base<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
{
template <typename CoefficientsType_, int Rows_, int Cols_, int Supers_, int Subs_, int Options_>
struct evaluator_traits<BandMatrixWrapper<CoefficientsType_, Rows_, Cols_, Supers_, Subs_, Options_> >
: public evaluator_traits_base<BandMatrixWrapper<CoefficientsType_, Rows_, Cols_, Supers_, Subs_, Options_> > {
typedef BandShape Shape;
};
template<> struct AssignmentKind<DenseShape,BandShape> { typedef EigenBase2EigenBase Kind; };
template <>
struct AssignmentKind<DenseShape, BandShape> {
typedef EigenBase2EigenBase Kind;
};
} // end namespace internal
} // end namespace internal
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_BANDMATRIX_H
#endif // EIGEN_BANDMATRIX_H

View File

@@ -11,442 +11,417 @@
#ifndef EIGEN_BLOCK_H
#define EIGEN_BLOCK_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprType>
{
typedef typename traits<XprType>::Scalar Scalar;
typedef typename traits<XprType>::StorageKind StorageKind;
typedef typename traits<XprType>::XprKind XprKind;
typedef typename ref_selector<XprType>::type XprTypeNested;
typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
enum{
MatrixRows = traits<XprType>::RowsAtCompileTime,
MatrixCols = traits<XprType>::ColsAtCompileTime,
template <typename XprType_, int BlockRows, int BlockCols, bool InnerPanel_>
struct traits<Block<XprType_, BlockRows, BlockCols, InnerPanel_>> : traits<XprType_> {
typedef typename traits<XprType_>::Scalar Scalar;
typedef typename traits<XprType_>::StorageKind StorageKind;
typedef typename traits<XprType_>::XprKind XprKind;
typedef typename ref_selector<XprType_>::type XprTypeNested;
typedef std::remove_reference_t<XprTypeNested> XprTypeNested_;
enum {
MatrixRows = traits<XprType_>::RowsAtCompileTime,
MatrixCols = traits<XprType_>::ColsAtCompileTime,
RowsAtCompileTime = MatrixRows == 0 ? 0 : BlockRows,
ColsAtCompileTime = MatrixCols == 0 ? 0 : BlockCols,
MaxRowsAtCompileTime = BlockRows==0 ? 0
: RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime)
: int(traits<XprType>::MaxRowsAtCompileTime),
MaxColsAtCompileTime = BlockCols==0 ? 0
: ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime)
: int(traits<XprType>::MaxColsAtCompileTime),
MaxRowsAtCompileTime = BlockRows == 0 ? 0
: RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime)
: int(traits<XprType_>::MaxRowsAtCompileTime),
MaxColsAtCompileTime = BlockCols == 0 ? 0
: ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime)
: int(traits<XprType_>::MaxColsAtCompileTime),
XprTypeIsRowMajor = (int(traits<XprType>::Flags)&RowMajorBit) != 0,
IsRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1
: (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
: XprTypeIsRowMajor,
XprTypeIsRowMajor = (int(traits<XprType_>::Flags) & RowMajorBit) != 0,
IsRowMajor = (MaxRowsAtCompileTime == 1 && MaxColsAtCompileTime != 1) ? 1
: (MaxColsAtCompileTime == 1 && MaxRowsAtCompileTime != 1) ? 0
: XprTypeIsRowMajor,
HasSameStorageOrderAsXprType = (IsRowMajor == XprTypeIsRowMajor),
InnerSize = IsRowMajor ? int(ColsAtCompileTime) : int(RowsAtCompileTime),
InnerStrideAtCompileTime = HasSameStorageOrderAsXprType
? int(inner_stride_at_compile_time<XprType>::ret)
: int(outer_stride_at_compile_time<XprType>::ret),
OuterStrideAtCompileTime = HasSameStorageOrderAsXprType
? int(outer_stride_at_compile_time<XprType>::ret)
: int(inner_stride_at_compile_time<XprType>::ret),
InnerStrideAtCompileTime = HasSameStorageOrderAsXprType ? int(inner_stride_at_compile_time<XprType_>::ret)
: int(outer_stride_at_compile_time<XprType_>::ret),
OuterStrideAtCompileTime = HasSameStorageOrderAsXprType ? int(outer_stride_at_compile_time<XprType_>::ret)
: int(inner_stride_at_compile_time<XprType_>::ret),
// FIXME, this traits is rather specialized for dense object and it needs to be cleaned further
FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
FlagsLvalueBit = is_lvalue<XprType_>::value ? LvalueBit : 0,
FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
Flags = (traits<XprType>::Flags & (DirectAccessBit | (InnerPanel?CompressedAccessBit:0))) | FlagsLvalueBit | FlagsRowMajorBit,
Flags = (traits<XprType_>::Flags & (DirectAccessBit | (InnerPanel_ ? CompressedAccessBit : 0))) | FlagsLvalueBit |
FlagsRowMajorBit,
// FIXME DirectAccessBit should not be handled by expressions
//
//
// Alignment is needed by MapBase's assertions
// We can sefely set it to false here. Internal alignment errors will be detected by an eigen_internal_assert in the respective evaluator
Alignment = 0
// We can sefely set it to false here. Internal alignment errors will be detected by an eigen_internal_assert in the
// respective evaluator
Alignment = 0,
InnerPanel = InnerPanel_ ? 1 : 0
};
};
template<typename XprType, int BlockRows=Dynamic, int BlockCols=Dynamic, bool InnerPanel = false,
bool HasDirectAccess = internal::has_direct_access<XprType>::ret> class BlockImpl_dense;
} // end namespace internal
template <typename XprType, int BlockRows = Dynamic, int BlockCols = Dynamic, bool InnerPanel = false,
bool HasDirectAccess = internal::has_direct_access<XprType>::ret>
class BlockImpl_dense;
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, typename StorageKind> class BlockImpl;
} // end namespace internal
template <typename XprType, int BlockRows, int BlockCols, bool InnerPanel, typename StorageKind>
class BlockImpl;
/** \class Block
* \ingroup Core_Module
*
* \brief Expression of a fixed-size or dynamic-size block
*
* \tparam XprType the type of the expression in which we are taking a block
* \tparam BlockRows the number of rows of the block we are taking at compile time (optional)
* \tparam BlockCols the number of columns of the block we are taking at compile time (optional)
* \tparam InnerPanel is true, if the block maps to a set of rows of a row major matrix or
* to set of columns of a column major matrix (optional). The parameter allows to determine
* at compile time whether aligned access is possible on the block expression.
*
* This class represents an expression of either a fixed-size or dynamic-size block. It is the return
* type of DenseBase::block(Index,Index,Index,Index) and DenseBase::block<int,int>(Index,Index) and
* most of the time this is the only way it is used.
*
* However, if you want to directly maniputate block expressions,
* for instance if you want to write a function returning such an expression, you
* will need to use this class.
*
* Here is an example illustrating the dynamic case:
* \include class_Block.cpp
* Output: \verbinclude class_Block.out
*
* \note Even though this expression has dynamic size, in the case where \a XprType
* has fixed size, this expression inherits a fixed maximal size which means that evaluating
* it does not cause a dynamic memory allocation.
*
* Here is an example illustrating the fixed-size case:
* \include class_FixedBlock.cpp
* Output: \verbinclude class_FixedBlock.out
*
* \sa DenseBase::block(Index,Index,Index,Index), DenseBase::block(Index,Index), class VectorBlock
*/
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class Block
: public BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind>
{
typedef BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind> Impl;
public:
//typedef typename Impl::Base Base;
typedef Impl Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(Block)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block)
typedef typename internal::remove_all<XprType>::type NestedExpression;
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr, Index i) : Impl(xpr,i)
{
eigen_assert( (i>=0) && (
((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && i<xpr.rows())
||((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && i<xpr.cols())));
}
* \ingroup Core_Module
*
* \brief Expression of a fixed-size or dynamic-size block
*
* \tparam XprType the type of the expression in which we are taking a block
* \tparam BlockRows the number of rows of the block we are taking at compile time (optional)
* \tparam BlockCols the number of columns of the block we are taking at compile time (optional)
* \tparam InnerPanel is true, if the block maps to a set of rows of a row major matrix or
* to set of columns of a column major matrix (optional). The parameter allows to determine
* at compile time whether aligned access is possible on the block expression.
*
* This class represents an expression of either a fixed-size or dynamic-size block. It is the return
* type of DenseBase::block(Index,Index,Index,Index) and DenseBase::block<int,int>(Index,Index) and
* most of the time this is the only way it is used.
*
* However, if you want to directly manipulate block expressions,
* for instance if you want to write a function returning such an expression, you
* will need to use this class.
*
* Here is an example illustrating the dynamic case:
* \include class_Block.cpp
* Output: \verbinclude class_Block.out
*
* \note Even though this expression has dynamic size, in the case where \a XprType
* has fixed size, this expression inherits a fixed maximal size which means that evaluating
* it does not cause a dynamic memory allocation.
*
* Here is an example illustrating the fixed-size case:
* \include class_FixedBlock.cpp
* Output: \verbinclude class_FixedBlock.out
*
* \sa DenseBase::block(Index,Index,Index,Index), DenseBase::block(Index,Index), class VectorBlock
*/
template <typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
class Block
: public BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind> {
typedef BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind> Impl;
using BlockHelper = internal::block_xpr_helper<Block>;
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr, Index startRow, Index startCol)
: Impl(xpr, startRow, startCol)
{
EIGEN_STATIC_ASSERT(RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic,THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE)
eigen_assert(startRow >= 0 && BlockRows >= 0 && startRow + BlockRows <= xpr.rows()
&& startCol >= 0 && BlockCols >= 0 && startCol + BlockCols <= xpr.cols());
}
public:
// typedef typename Impl::Base Base;
typedef Impl Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(Block)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block)
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline Block(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Impl(xpr, startRow, startCol, blockRows, blockCols)
{
eigen_assert((RowsAtCompileTime==Dynamic || RowsAtCompileTime==blockRows)
&& (ColsAtCompileTime==Dynamic || ColsAtCompileTime==blockCols));
eigen_assert(startRow >= 0 && blockRows >= 0 && startRow <= xpr.rows() - blockRows
&& startCol >= 0 && blockCols >= 0 && startCol <= xpr.cols() - blockCols);
}
typedef internal::remove_all_t<XprType> NestedExpression;
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Block(XprType& xpr, Index i) : Impl(xpr, i) {
eigen_assert((i >= 0) && (((BlockRows == 1) && (BlockCols == XprType::ColsAtCompileTime) && i < xpr.rows()) ||
((BlockRows == XprType::RowsAtCompileTime) && (BlockCols == 1) && i < xpr.cols())));
}
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Block(XprType& xpr, Index startRow, Index startCol)
: Impl(xpr, startRow, startCol) {
EIGEN_STATIC_ASSERT(RowsAtCompileTime != Dynamic && ColsAtCompileTime != Dynamic,
THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE)
eigen_assert(startRow >= 0 && BlockRows >= 0 && startRow + BlockRows <= xpr.rows() && startCol >= 0 &&
BlockCols >= 0 && startCol + BlockCols <= xpr.cols());
}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Block(XprType& xpr, Index startRow, Index startCol, Index blockRows,
Index blockCols)
: Impl(xpr, startRow, startCol, blockRows, blockCols) {
eigen_assert((RowsAtCompileTime == Dynamic || RowsAtCompileTime == blockRows) &&
(ColsAtCompileTime == Dynamic || ColsAtCompileTime == blockCols));
eigen_assert(startRow >= 0 && blockRows >= 0 && startRow <= xpr.rows() - blockRows && startCol >= 0 &&
blockCols >= 0 && startCol <= xpr.cols() - blockCols);
}
// convert nested blocks (e.g. Block<Block<MatrixType>>) to a simple block expression (Block<MatrixType>)
using ConstUnwindReturnType = Block<const typename BlockHelper::BaseType, BlockRows, BlockCols, InnerPanel>;
using UnwindReturnType = Block<typename BlockHelper::BaseType, BlockRows, BlockCols, InnerPanel>;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ConstUnwindReturnType unwind() const {
return ConstUnwindReturnType(BlockHelper::base(*this), BlockHelper::row(*this, 0), BlockHelper::col(*this, 0),
this->rows(), this->cols());
}
template <typename T = Block, typename EnableIf = std::enable_if_t<!std::is_const<T>::value>>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE UnwindReturnType unwind() {
return UnwindReturnType(BlockHelper::base(*this), BlockHelper::row(*this, 0), BlockHelper::col(*this, 0),
this->rows(), this->cols());
}
};
// The generic default implementation for dense block simplu forward to the internal::BlockImpl_dense
// The generic default implementation for dense block simply forward to the internal::BlockImpl_dense
// that must be specialized for direct and non-direct access...
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
template <typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
class BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, Dense>
: public internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel>
{
typedef internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel> Impl;
typedef typename XprType::StorageIndex StorageIndex;
public:
typedef Impl Base;
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index i) : Impl(xpr,i) {}
EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index startRow, Index startCol) : Impl(xpr, startRow, startCol) {}
EIGEN_DEVICE_FUNC
inline BlockImpl(XprType& xpr, Index startRow, Index startCol, Index blockRows, Index blockCols)
: public internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel> {
typedef internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel> Impl;
typedef typename XprType::StorageIndex StorageIndex;
public:
typedef Impl Base;
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index i) : Impl(xpr, i) {}
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol)
: Impl(xpr, startRow, startCol) {}
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE BlockImpl(XprType& xpr, Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Impl(xpr, startRow, startCol, blockRows, blockCols) {}
};
namespace internal {
/** \internal Internal implementation of dense Blocks in the general case. */
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool HasDirectAccess> class BlockImpl_dense
: public internal::dense_xpr_base<Block<XprType, BlockRows, BlockCols, InnerPanel> >::type
{
typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
typedef typename internal::ref_selector<XprType>::non_const_type XprTypeNested;
public:
template <typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool HasDirectAccess>
class BlockImpl_dense : public internal::dense_xpr_base<Block<XprType, BlockRows, BlockCols, InnerPanel>>::type {
typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
typedef typename internal::ref_selector<XprType>::non_const_type XprTypeNested;
typedef typename internal::dense_xpr_base<BlockType>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
public:
typedef typename internal::dense_xpr_base<BlockType>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
// class InnerIterator; // FIXME apparently never used
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index i)
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC constexpr BlockImpl_dense(XprType& xpr, Index i)
: m_xpr(xpr),
// It is a row if and only if BlockRows==1 and BlockCols==XprType::ColsAtCompileTime,
// and it is a column if and only if BlockRows==XprType::RowsAtCompileTime and BlockCols==1,
// all other cases are invalid.
// The case a 1x1 matrix seems ambiguous, but the result is the same anyway.
m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0),
m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0),
m_blockRows(BlockRows==1 ? 1 : xpr.rows()),
m_blockCols(BlockCols==1 ? 1 : xpr.cols())
{}
m_startRow((BlockRows == 1) && (BlockCols == XprType::ColsAtCompileTime) ? i : 0),
m_startCol((BlockRows == XprType::RowsAtCompileTime) && (BlockCols == 1) ? i : 0),
m_blockRows(BlockRows == 1 ? 1 : xpr.rows()),
m_blockCols(BlockCols == 1 ? 1 : xpr.cols()) {}
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: m_xpr(xpr), m_startRow(startRow), m_startCol(startCol),
m_blockRows(BlockRows), m_blockCols(BlockCols)
{}
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC constexpr BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: m_xpr(xpr), m_startRow(startRow), m_startCol(startCol), m_blockRows(BlockRows), m_blockCols(BlockCols) {}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: m_xpr(xpr), m_startRow(startRow), m_startCol(startCol),
m_blockRows(blockRows), m_blockCols(blockCols)
{}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC constexpr BlockImpl_dense(XprType& xpr, Index startRow, Index startCol, Index blockRows,
Index blockCols)
: m_xpr(xpr), m_startRow(startRow), m_startCol(startCol), m_blockRows(blockRows), m_blockCols(blockCols) {}
EIGEN_DEVICE_FUNC inline Index rows() const { return m_blockRows.value(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_blockCols.value(); }
EIGEN_DEVICE_FUNC constexpr Index rows() const { return m_blockRows.value(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const { return m_blockCols.value(); }
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index rowId, Index colId)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
return m_xpr.coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index rowId, Index colId) {
EIGEN_STATIC_ASSERT_LVALUE(XprType)
return m_xpr.coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index rowId, Index colId) const
{
return m_xpr.derived().coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index rowId, Index colId) const {
return m_xpr.derived().coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const
{
return m_xpr.coeff(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const {
return m_xpr.coeff(rowId + m_startRow.value(), colId + m_startCol.value());
}
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index index)
{
EIGEN_STATIC_ASSERT_LVALUE(XprType)
return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index) {
EIGEN_STATIC_ASSERT_LVALUE(XprType)
return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index index) const
{
return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const {
return m_xpr.coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
EIGEN_DEVICE_FUNC
inline const CoeffReturnType coeff(Index index) const
{
return m_xpr.coeff(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const {
return m_xpr.coeff(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
template<int LoadMode>
inline PacketScalar packet(Index rowId, Index colId) const
{
return m_xpr.template packet<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value());
}
template <int LoadMode>
EIGEN_DEVICE_FUNC inline PacketScalar packet(Index rowId, Index colId) const {
return m_xpr.template packet<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value());
}
template<int LoadMode>
inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
{
m_xpr.template writePacket<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value(), val);
}
template <int LoadMode>
EIGEN_DEVICE_FUNC inline void writePacket(Index rowId, Index colId, const PacketScalar& val) {
m_xpr.template writePacket<Unaligned>(rowId + m_startRow.value(), colId + m_startCol.value(), val);
}
template<int LoadMode>
inline PacketScalar packet(Index index) const
{
return m_xpr.template packet<Unaligned>
(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
template <int LoadMode>
EIGEN_DEVICE_FUNC inline PacketScalar packet(Index index) const {
return m_xpr.template packet<Unaligned>(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
}
template<int LoadMode>
inline void writePacket(Index index, const PacketScalar& val)
{
m_xpr.template writePacket<Unaligned>
(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0), val);
}
template <int LoadMode>
EIGEN_DEVICE_FUNC inline void writePacket(Index index, const PacketScalar& val) {
m_xpr.template writePacket<Unaligned>(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0), val);
}
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \sa MapBase::data() */
EIGEN_DEVICE_FUNC inline const Scalar* data() const;
EIGEN_DEVICE_FUNC inline Index innerStride() const;
EIGEN_DEVICE_FUNC inline Index outerStride() const;
#endif
#ifdef EIGEN_PARSED_BY_DOXYGEN
/** \sa MapBase::data() */
EIGEN_DEVICE_FUNC constexpr const Scalar* data() const;
EIGEN_DEVICE_FUNC inline Index innerStride() const;
EIGEN_DEVICE_FUNC inline Index outerStride() const;
#endif
EIGEN_DEVICE_FUNC
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const internal::remove_all_t<XprTypeNested>& nestedExpression() const {
return m_xpr;
}
EIGEN_DEVICE_FUNC
XprType& nestedExpression() { return m_xpr; }
EIGEN_DEVICE_FUNC
StorageIndex startRow() const
{
return m_startRow.value();
}
EIGEN_DEVICE_FUNC
StorageIndex startCol() const
{
return m_startCol.value();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE XprType& nestedExpression() { return m_xpr; }
protected:
EIGEN_DEVICE_FUNC constexpr StorageIndex startRow() const noexcept { return m_startRow.value(); }
XprTypeNested m_xpr;
const internal::variable_if_dynamic<StorageIndex, (XprType::RowsAtCompileTime == 1 && BlockRows==1) ? 0 : Dynamic> m_startRow;
const internal::variable_if_dynamic<StorageIndex, (XprType::ColsAtCompileTime == 1 && BlockCols==1) ? 0 : Dynamic> m_startCol;
const internal::variable_if_dynamic<StorageIndex, RowsAtCompileTime> m_blockRows;
const internal::variable_if_dynamic<StorageIndex, ColsAtCompileTime> m_blockCols;
EIGEN_DEVICE_FUNC constexpr StorageIndex startCol() const noexcept { return m_startCol.value(); }
protected:
XprTypeNested m_xpr;
const internal::variable_if_dynamic<StorageIndex, (XprType::RowsAtCompileTime == 1 && BlockRows == 1) ? 0 : Dynamic>
m_startRow;
const internal::variable_if_dynamic<StorageIndex, (XprType::ColsAtCompileTime == 1 && BlockCols == 1) ? 0 : Dynamic>
m_startCol;
const internal::variable_if_dynamic<StorageIndex, RowsAtCompileTime> m_blockRows;
const internal::variable_if_dynamic<StorageIndex, ColsAtCompileTime> m_blockCols;
};
/** \internal Internal implementation of dense Blocks in the direct access case.*/
template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
: public MapBase<Block<XprType, BlockRows, BlockCols, InnerPanel> >
{
typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
typedef typename internal::ref_selector<XprType>::non_const_type XprTypeNested;
enum {
XprTypeIsRowMajor = (int(traits<XprType>::Flags)&RowMajorBit) != 0
};
public:
template <typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
class BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel, true>
: public MapBase<Block<XprType, BlockRows, BlockCols, InnerPanel>> {
typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
typedef typename internal::ref_selector<XprType>::non_const_type XprTypeNested;
enum { XprTypeIsRowMajor = (int(traits<XprType>::Flags) & RowMajorBit) != 0 };
typedef MapBase<BlockType> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
/** \internal Returns base+offset (unless base is null, in which case returns null).
* Adding an offset to nullptr is undefined behavior, so we must avoid it.
*/
template <typename Scalar>
EIGEN_DEVICE_FUNC constexpr EIGEN_ALWAYS_INLINE static Scalar* add_to_nullable_pointer(Scalar* base, Index offset) {
return base != nullptr ? base + offset : nullptr;
}
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index i)
: Base(xpr.data() + i * ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor))
|| ((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && ( XprTypeIsRowMajor)) ? xpr.innerStride() : xpr.outerStride()),
BlockRows==1 ? 1 : xpr.rows(),
BlockCols==1 ? 1 : xpr.cols()),
public:
typedef MapBase<BlockType> Base;
EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
/** Column or Row constructor
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index i)
: Base((BlockRows == 0 || BlockCols == 0)
? nullptr
: add_to_nullable_pointer(
xpr.data(),
i * (((BlockRows == 1) && (BlockCols == XprType::ColsAtCompileTime) && (!XprTypeIsRowMajor)) ||
((BlockRows == XprType::RowsAtCompileTime) && (BlockCols == 1) &&
(XprTypeIsRowMajor))
? xpr.innerStride()
: xpr.outerStride())),
BlockRows == 1 ? 1 : xpr.rows(), BlockCols == 1 ? 1 : xpr.cols()),
m_xpr(xpr),
m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0),
m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0)
{
init();
}
m_startRow((BlockRows == 1) && (BlockCols == XprType::ColsAtCompileTime) ? i : 0),
m_startCol((BlockRows == XprType::RowsAtCompileTime) && (BlockCols == 1) ? i : 0) {
init();
}
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol)),
m_xpr(xpr), m_startRow(startRow), m_startCol(startCol)
{
init();
}
/** Fixed-size constructor
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
: Base((BlockRows == 0 || BlockCols == 0)
? nullptr
: add_to_nullable_pointer(xpr.data(),
xpr.innerStride() * (XprTypeIsRowMajor ? startCol : startRow) +
xpr.outerStride() * (XprTypeIsRowMajor ? startRow : startCol))),
m_xpr(xpr),
m_startRow(startRow),
m_startCol(startCol) {
init();
}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr,
Index startRow, Index startCol,
Index blockRows, Index blockCols)
: Base(xpr.data()+xpr.innerStride()*(XprTypeIsRowMajor?startCol:startRow) + xpr.outerStride()*(XprTypeIsRowMajor?startRow:startCol), blockRows, blockCols),
m_xpr(xpr), m_startRow(startRow), m_startCol(startCol)
{
init();
}
/** Dynamic-size constructor
*/
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, Index startRow, Index startCol, Index blockRows,
Index blockCols)
: Base((blockRows == 0 || blockCols == 0)
? nullptr
: add_to_nullable_pointer(xpr.data(),
xpr.innerStride() * (XprTypeIsRowMajor ? startCol : startRow) +
xpr.outerStride() * (XprTypeIsRowMajor ? startRow : startCol)),
blockRows, blockCols),
m_xpr(xpr),
m_startRow(startRow),
m_startCol(startCol) {
init();
}
EIGEN_DEVICE_FUNC
const typename internal::remove_all<XprTypeNested>::type& nestedExpression() const
{
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const internal::remove_all_t<XprTypeNested>& nestedExpression() const noexcept {
return m_xpr;
}
EIGEN_DEVICE_FUNC
XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC
inline Index innerStride() const
{
return internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.innerStride()
: m_xpr.outerStride();
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE XprType& nestedExpression() { return m_xpr; }
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return m_outerStride;
}
/** \sa MapBase::innerStride() */
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept {
return internal::traits<BlockType>::HasSameStorageOrderAsXprType ? m_xpr.innerStride() : m_xpr.outerStride();
}
EIGEN_DEVICE_FUNC
StorageIndex startRow() const
{
return m_startRow.value();
}
/** \sa MapBase::outerStride() */
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept {
return internal::traits<BlockType>::HasSameStorageOrderAsXprType ? m_xpr.outerStride() : m_xpr.innerStride();
}
EIGEN_DEVICE_FUNC
StorageIndex startCol() const
{
return m_startCol.value();
}
EIGEN_DEVICE_FUNC constexpr StorageIndex startRow() const noexcept { return m_startRow.value(); }
#ifndef __SUNPRO_CC
EIGEN_DEVICE_FUNC constexpr StorageIndex startCol() const noexcept { return m_startCol.value(); }
#ifndef __SUNPRO_CC
// FIXME sunstudio is not friendly with the above friend...
// META-FIXME there is no 'friend' keyword around here. Is this obsolete?
protected:
#endif
protected:
#endif
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** \internal used by allowAligned() */
EIGEN_DEVICE_FUNC
inline BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows, Index blockCols)
: Base(data, blockRows, blockCols), m_xpr(xpr)
{
init();
}
#endif
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** \internal used by allowAligned() */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows,
Index blockCols)
: Base(data, blockRows, blockCols), m_xpr(xpr) {
init();
}
#endif
protected:
EIGEN_DEVICE_FUNC
void init()
{
m_outerStride = internal::traits<BlockType>::HasSameStorageOrderAsXprType
? m_xpr.outerStride()
: m_xpr.innerStride();
}
protected:
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void init() {
m_outerStride =
internal::traits<BlockType>::HasSameStorageOrderAsXprType ? m_xpr.outerStride() : m_xpr.innerStride();
}
XprTypeNested m_xpr;
const internal::variable_if_dynamic<StorageIndex, (XprType::RowsAtCompileTime == 1 && BlockRows==1) ? 0 : Dynamic> m_startRow;
const internal::variable_if_dynamic<StorageIndex, (XprType::ColsAtCompileTime == 1 && BlockCols==1) ? 0 : Dynamic> m_startCol;
Index m_outerStride;
XprTypeNested m_xpr;
const internal::variable_if_dynamic<StorageIndex, (XprType::RowsAtCompileTime == 1 && BlockRows == 1) ? 0 : Dynamic>
m_startRow;
const internal::variable_if_dynamic<StorageIndex, (XprType::ColsAtCompileTime == 1 && BlockCols == 1) ? 0 : Dynamic>
m_startCol;
Index m_outerStride;
};
} // end namespace internal
} // end namespace internal
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_BLOCK_H
#endif // EIGEN_BLOCK_H

View File

@@ -1,164 +0,0 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_ALLANDANY_H
#define EIGEN_ALLANDANY_H
namespace Eigen {
namespace internal {
template<typename Derived, int UnrollCount>
struct all_unroller
{
typedef typename Derived::ExpressionTraits Traits;
enum {
col = (UnrollCount-1) / Traits::RowsAtCompileTime,
row = (UnrollCount-1) % Traits::RowsAtCompileTime
};
static inline bool run(const Derived &mat)
{
return all_unroller<Derived, UnrollCount-1>::run(mat) && mat.coeff(row, col);
}
};
template<typename Derived>
struct all_unroller<Derived, 0>
{
static inline bool run(const Derived &/*mat*/) { return true; }
};
template<typename Derived>
struct all_unroller<Derived, Dynamic>
{
static inline bool run(const Derived &) { return false; }
};
template<typename Derived, int UnrollCount>
struct any_unroller
{
typedef typename Derived::ExpressionTraits Traits;
enum {
col = (UnrollCount-1) / Traits::RowsAtCompileTime,
row = (UnrollCount-1) % Traits::RowsAtCompileTime
};
static inline bool run(const Derived &mat)
{
return any_unroller<Derived, UnrollCount-1>::run(mat) || mat.coeff(row, col);
}
};
template<typename Derived>
struct any_unroller<Derived, 0>
{
static inline bool run(const Derived & /*mat*/) { return false; }
};
template<typename Derived>
struct any_unroller<Derived, Dynamic>
{
static inline bool run(const Derived &) { return false; }
};
} // end namespace internal
/** \returns true if all coefficients are true
*
* Example: \include MatrixBase_all.cpp
* Output: \verbinclude MatrixBase_all.out
*
* \sa any(), Cwise::operator<()
*/
template<typename Derived>
inline bool DenseBase<Derived>::all() const
{
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)
return internal::all_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
for(Index i = 0; i < rows(); ++i)
if (!evaluator.coeff(i, j)) return false;
return true;
}
}
/** \returns true if at least one coefficient is true
*
* \sa all()
*/
template<typename Derived>
inline bool DenseBase<Derived>::any() const
{
typedef internal::evaluator<Derived> Evaluator;
enum {
unroll = SizeAtCompileTime != Dynamic
&& SizeAtCompileTime * (Evaluator::CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
};
Evaluator evaluator(derived());
if(unroll)
return internal::any_unroller<Evaluator, unroll ? int(SizeAtCompileTime) : Dynamic>::run(evaluator);
else
{
for(Index j = 0; j < cols(); ++j)
for(Index i = 0; i < rows(); ++i)
if (evaluator.coeff(i, j)) return true;
return false;
}
}
/** \returns the number of coefficients which evaluate to true
*
* \sa all(), any()
*/
template<typename Derived>
inline Eigen::Index DenseBase<Derived>::count() const
{
return derived().template cast<bool>().template cast<Index>().sum();
}
/** \returns true is \c *this contains at least one Not A Number (NaN).
*
* \sa allFinite()
*/
template<typename Derived>
inline bool DenseBase<Derived>::hasNaN() const
{
#if EIGEN_COMP_MSVC || (defined __FAST_MATH__)
return derived().array().isNaN().any();
#else
return !((derived().array()==derived().array()).all());
#endif
}
/** \returns true if \c *this contains only finite numbers, i.e., no NaN and no +/-INF values.
*
* \sa hasNaN()
*/
template<typename Derived>
inline bool DenseBase<Derived>::allFinite() const
{
#if EIGEN_COMP_MSVC || (defined __FAST_MATH__)
return derived().array().isFinite().all();
#else
return !((derived()-derived()).hasNaN());
#endif
}
} // end namespace Eigen
#endif // EIGEN_ALLANDANY_H

View File

@@ -11,45 +11,45 @@
#ifndef EIGEN_COMMAINITIALIZER_H
#define EIGEN_COMMAINITIALIZER_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \class CommaInitializer
* \ingroup Core_Module
*
* \brief Helper class used by the comma initializer operator
*
* This class is internally used to implement the comma initializer feature. It is
* the return type of MatrixBase::operator<<, and most of the time this is the only
* way it is used.
*
* \sa \blank \ref MatrixBaseCommaInitRef "MatrixBase::operator<<", CommaInitializer::finished()
*/
template<typename XprType>
struct CommaInitializer
{
* \ingroup Core_Module
*
* \brief Helper class used by the comma initializer operator
*
* This class is internally used to implement the comma initializer feature. It is
* the return type of MatrixBase::operator<<, and most of the time this is the only
* way it is used.
*
* \sa \blank \ref MatrixBaseCommaInitRef "MatrixBase::operator<<", CommaInitializer::finished()
*/
template <typename XprType>
struct CommaInitializer {
typedef typename XprType::Scalar Scalar;
EIGEN_DEVICE_FUNC
inline CommaInitializer(XprType& xpr, const Scalar& s)
: m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1)
{
m_xpr.coeffRef(0,0) = s;
EIGEN_DEVICE_FUNC constexpr CommaInitializer(XprType& xpr, const Scalar& s)
: m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1) {
eigen_assert(m_xpr.rows() > 0 && m_xpr.cols() > 0 && "Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.coeffRef(0, 0) = s;
}
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
: m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows())
{
m_xpr.block(0, 0, other.rows(), other.cols()) = other;
template <typename OtherDerived>
EIGEN_DEVICE_FUNC inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
: m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows()) {
eigen_assert(m_xpr.rows() >= other.rows() && m_xpr.cols() >= other.cols() &&
"Cannot comma-initialize a 0x0 matrix (operator<<)");
m_xpr.template block<OtherDerived::RowsAtCompileTime, OtherDerived::ColsAtCompileTime>(0, 0, other.rows(),
other.cols()) = other;
}
/* Copy/Move constructor which transfers ownership. This is crucial in
/* Copy/Move constructor which transfers ownership. This is crucial in
* absence of return value optimization to avoid assertions during destruction. */
// FIXME in C++11 mode this could be replaced by a proper RValue constructor
EIGEN_DEVICE_FUNC
inline CommaInitializer(const CommaInitializer& o)
: m_xpr(o.m_xpr), m_row(o.m_row), m_col(o.m_col), m_currentBlockRows(o.m_currentBlockRows) {
EIGEN_DEVICE_FUNC inline CommaInitializer(const CommaInitializer& o)
: m_xpr(o.m_xpr), m_row(o.m_row), m_col(o.m_col), m_currentBlockRows(o.m_currentBlockRows) {
// Mark original object as finished. In absence of R-value references we need to const_cast:
const_cast<CommaInitializer&>(o).m_row = m_xpr.rows();
const_cast<CommaInitializer&>(o).m_col = m_xpr.cols();
@@ -57,104 +57,92 @@ struct CommaInitializer
}
/* inserts a scalar value in the target matrix */
EIGEN_DEVICE_FUNC
CommaInitializer& operator,(const Scalar& s)
{
if (m_col==m_xpr.cols())
{
m_row+=m_currentBlockRows;
EIGEN_DEVICE_FUNC CommaInitializer &operator,(const Scalar& s) {
if (m_col == m_xpr.cols()) {
m_row += m_currentBlockRows;
m_col = 0;
m_currentBlockRows = 1;
eigen_assert(m_row<m_xpr.rows()
&& "Too many rows passed to comma initializer (operator<<)");
eigen_assert(m_row < m_xpr.rows() && "Too many rows passed to comma initializer (operator<<)");
}
eigen_assert(m_col<m_xpr.cols()
&& "Too many coefficients passed to comma initializer (operator<<)");
eigen_assert(m_currentBlockRows==1);
eigen_assert(m_col < m_xpr.cols() && "Too many coefficients passed to comma initializer (operator<<)");
eigen_assert(m_currentBlockRows == 1);
m_xpr.coeffRef(m_row, m_col++) = s;
return *this;
}
/* inserts a matrix expression in the target matrix */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
CommaInitializer& operator,(const DenseBase<OtherDerived>& other)
{
if (m_col==m_xpr.cols() && (other.cols()!=0 || other.rows()!=m_currentBlockRows))
{
m_row+=m_currentBlockRows;
template <typename OtherDerived>
EIGEN_DEVICE_FUNC CommaInitializer &operator,(const DenseBase<OtherDerived>& other) {
if (m_col == m_xpr.cols() && (other.cols() != 0 || other.rows() != m_currentBlockRows)) {
m_row += m_currentBlockRows;
m_col = 0;
m_currentBlockRows = other.rows();
eigen_assert(m_row+m_currentBlockRows<=m_xpr.rows()
&& "Too many rows passed to comma initializer (operator<<)");
eigen_assert(m_row + m_currentBlockRows <= m_xpr.rows() &&
"Too many rows passed to comma initializer (operator<<)");
}
eigen_assert((m_col + other.cols() <= m_xpr.cols())
&& "Too many coefficients passed to comma initializer (operator<<)");
eigen_assert(m_currentBlockRows==other.rows());
m_xpr.template block<OtherDerived::RowsAtCompileTime, OtherDerived::ColsAtCompileTime>
(m_row, m_col, other.rows(), other.cols()) = other;
eigen_assert((m_col + other.cols() <= m_xpr.cols()) &&
"Too many coefficients passed to comma initializer (operator<<)");
eigen_assert(m_currentBlockRows == other.rows());
m_xpr.template block<OtherDerived::RowsAtCompileTime, OtherDerived::ColsAtCompileTime>(m_row, m_col, other.rows(),
other.cols()) = other;
m_col += other.cols();
return *this;
}
EIGEN_DEVICE_FUNC
inline ~CommaInitializer()
EIGEN_DEVICE_FUNC inline ~CommaInitializer()
#if defined VERIFY_RAISES_ASSERT && (!defined EIGEN_NO_ASSERTION_CHECKING) && defined EIGEN_EXCEPTIONS
EIGEN_EXCEPTION_SPEC(Eigen::eigen_assert_exception)
noexcept(false) // Eigen::eigen_assert_exception
#endif
{
finished();
finished();
}
/** \returns the built matrix once all its coefficients have been set.
* Calling finished is 100% optional. Its purpose is to write expressions
* like this:
* \code
* quaternion.fromRotationMatrix((Matrix3f() << axis0, axis1, axis2).finished());
* \endcode
*/
EIGEN_DEVICE_FUNC
inline XprType& finished() {
eigen_assert(((m_row+m_currentBlockRows) == m_xpr.rows() || m_xpr.cols() == 0)
&& m_col == m_xpr.cols()
&& "Too few coefficients passed to comma initializer (operator<<)");
return m_xpr;
* Calling finished is 100% optional. Its purpose is to write expressions
* like this:
* \code
* quaternion.fromRotationMatrix((Matrix3f() << axis0, axis1, axis2).finished());
* \endcode
*/
EIGEN_DEVICE_FUNC inline XprType& finished() {
eigen_assert(((m_row + m_currentBlockRows) == m_xpr.rows() || m_xpr.cols() == 0) && m_col == m_xpr.cols() &&
"Too few coefficients passed to comma initializer (operator<<)");
return m_xpr;
}
XprType& m_xpr; // target expression
Index m_row; // current row id
Index m_col; // current col id
Index m_currentBlockRows; // current block height
XprType& m_xpr; // target expression
Index m_row; // current row id
Index m_col; // current col id
Index m_currentBlockRows; // current block height
};
/** \anchor MatrixBaseCommaInitRef
* Convenient operator to set the coefficients of a matrix.
*
* The coefficients must be provided in a row major order and exactly match
* the size of the matrix. Otherwise an assertion is raised.
*
* Example: \include MatrixBase_set.cpp
* Output: \verbinclude MatrixBase_set.out
*
* \note According the c++ standard, the argument expressions of this comma initializer are evaluated in arbitrary order.
*
* \sa CommaInitializer::finished(), class CommaInitializer
*/
template<typename Derived>
inline CommaInitializer<Derived> DenseBase<Derived>::operator<< (const Scalar& s)
{
* Convenient operator to set the coefficients of a matrix.
*
* The coefficients must be provided in a row major order and exactly match
* the size of the matrix. Otherwise an assertion is raised.
*
* Example: \include MatrixBase_set.cpp
* Output: \verbinclude MatrixBase_set.out
*
* \note According the c++ standard, the argument expressions of this comma initializer are evaluated in arbitrary
* order.
*
* \sa CommaInitializer::finished(), class CommaInitializer
*/
template <typename Derived>
EIGEN_DEVICE_FUNC inline CommaInitializer<Derived> DenseBase<Derived>::operator<<(const Scalar& s) {
return CommaInitializer<Derived>(*static_cast<Derived*>(this), s);
}
/** \sa operator<<(const Scalar&) */
template<typename Derived>
template<typename OtherDerived>
inline CommaInitializer<Derived>
DenseBase<Derived>::operator<<(const DenseBase<OtherDerived>& other)
{
return CommaInitializer<Derived>(*static_cast<Derived *>(this), other);
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC inline CommaInitializer<Derived> DenseBase<Derived>::operator<<(
const DenseBase<OtherDerived>& other) {
return CommaInitializer<Derived>(*static_cast<Derived*>(this), other);
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_COMMAINITIALIZER_H
#endif // EIGEN_COMMAINITIALIZER_H

View File

@@ -1,7 +1,7 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2016 Rasmus Munk Larsen (rmlarsen@google.com)
// Copyright (C) 2016 Rasmus Munk Larsen (rmlarsen@gmail.com)
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
@@ -10,6 +10,9 @@
#ifndef EIGEN_CONDITIONESTIMATOR_H
#define EIGEN_CONDITIONESTIMATOR_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
@@ -19,7 +22,7 @@ struct rcond_compute_sign {
static inline Vector run(const Vector& v) {
const RealVector v_abs = v.cwiseAbs();
return (v_abs.array() == static_cast<typename Vector::RealScalar>(0))
.select(Vector::Ones(v.size()), v.cwiseQuotient(v_abs));
.select(Vector::Ones(v.size()), v.cwiseQuotient(v_abs));
}
};
@@ -28,33 +31,31 @@ template <typename Vector>
struct rcond_compute_sign<Vector, Vector, false> {
static inline Vector run(const Vector& v) {
return (v.array() < static_cast<typename Vector::RealScalar>(0))
.select(-Vector::Ones(v.size()), Vector::Ones(v.size()));
.select(-Vector::Ones(v.size()), Vector::Ones(v.size()));
}
};
/**
* \returns an estimate of ||inv(matrix)||_1 given a decomposition of
* \a matrix that implements .solve() and .adjoint().solve() methods.
*
* This function implements Algorithms 4.1 and 5.1 from
* http://www.maths.manchester.ac.uk/~higham/narep/narep135.pdf
* which also forms the basis for the condition number estimators in
* LAPACK. Since at most 10 calls to the solve method of dec are
* performed, the total cost is O(dims^2), as opposed to O(dims^3)
* needed to compute the inverse matrix explicitly.
*
* The most common usage is in estimating the condition number
* ||matrix||_1 * ||inv(matrix)||_1. The first term ||matrix||_1 can be
* computed directly in O(n^2) operations.
*
* Supports the following decompositions: FullPivLU, PartialPivLU, LDLT, and
* LLT.
*
* \sa FullPivLU, PartialPivLU, LDLT, LLT.
*/
* \returns an estimate of ||inv(matrix)||_1 given a decomposition of
* \a matrix that implements .solve() and .adjoint().solve() methods.
*
* This function implements Algorithms 4.1 and 5.1 from
* Higham, "Experience with a Matrix Norm Estimator",
* SIAM J. Sci. Stat. Comput., 11(4):804-809, 1990.
* with Higham's alternating-sign safety-net estimate from
* Higham and Tisseur, "A Block Algorithm for Matrix 1-Norm Estimation,
* with an Application to 1-Norm Pseudospectra", SIAM J. Matrix Anal. Appl.,
* 21(4):1185-1201, 2000.
*
* The Hager/Higham gradient ascent uses at most 5 iterations of 2 solves
* each, giving a total cost of O(n^2).
*
* Supports the following decompositions: FullPivLU, PartialPivLU, LDLT, LLT.
*
* \sa FullPivLU, PartialPivLU, LDLT, LLT.
*/
template <typename Decomposition>
typename Decomposition::RealScalar rcond_invmatrix_L1_norm_estimate(const Decomposition& dec)
{
typename Decomposition::RealScalar rcond_invmatrix_L1_norm_estimate(const Decomposition& dec) {
typedef typename Decomposition::MatrixType MatrixType;
typedef typename Decomposition::Scalar Scalar;
typedef typename Decomposition::RealScalar RealScalar;
@@ -64,54 +65,49 @@ typename Decomposition::RealScalar rcond_invmatrix_L1_norm_estimate(const Decomp
eigen_assert(dec.rows() == dec.cols());
const Index n = dec.rows();
if (n == 0)
return 0;
if (n == 0) return RealScalar(0);
// Disable Index to float conversion warning
// Disable Index to float conversion warning
#ifdef __INTEL_COMPILER
#pragma warning push
#pragma warning ( disable : 2259 )
#pragma warning push
#pragma warning(disable : 2259)
#endif
Vector v = dec.solve(Vector::Ones(n) / Scalar(n));
#ifdef __INTEL_COMPILER
#pragma warning pop
#pragma warning pop
#endif
// lower_bound is a lower bound on
// ||inv(matrix)||_1 = sup_v ||inv(matrix) v||_1 / ||v||_1
// and is the objective maximized by the ("super-") gradient ascent
// algorithm below.
// and is the objective maximized by the supergradient ascent algorithm below.
RealScalar lower_bound = v.template lpNorm<1>();
if (n == 1)
return lower_bound;
if (n == 1) return lower_bound;
// Gradient ascent algorithm follows: We know that the optimum is achieved at
// one of the simplices v = e_i, so in each iteration we follow a
// super-gradient to move towards the optimal one.
// Gradient ascent: the optimum is achieved at a unit vector e_j. Each
// iteration follows the supergradient to find which unit vector to probe next.
RealScalar old_lower_bound = lower_bound;
Vector sign_vector(n);
Vector old_sign_vector;
Index v_max_abs_index = -1;
Index old_v_max_abs_index = v_max_abs_index;
for (int k = 0; k < 4; ++k)
{
for (int k = 0; k < 4; ++k) {
sign_vector = internal::rcond_compute_sign<Vector, RealVector, is_complex>::run(v);
if (k > 0 && !is_complex && sign_vector == old_sign_vector) {
// Break if the solution stagnated.
// Break if the sign vector stagnated.
break;
}
// v_max_abs_index = argmax |real( inv(matrix)^T * sign_vector )|
// Supergradient: z = A^{-T} * sign(v), pick argmax |z_i|.
v = dec.adjoint().solve(sign_vector);
v.real().cwiseAbs().maxCoeff(&v_max_abs_index);
if (v_max_abs_index == old_v_max_abs_index) {
// Break if the solution stagnated.
// Optimality: supergradient points to the same unit vector.
break;
}
// Move to the new simplex e_j, where j = v_max_abs_index.
v = dec.solve(Vector::Unit(n, v_max_abs_index)); // v = inv(matrix) * e_j.
// Probe the best unit vector: v = A^{-1} * e_j.
v = dec.solve(Vector::Unit(n, v_max_abs_index));
lower_bound = v.template lpNorm<1>();
if (lower_bound <= old_lower_bound) {
// Break if the gradient step did not increase the lower_bound.
// No improvement from the gradient step.
break;
}
if (!is_complex) {
@@ -120,52 +116,45 @@ typename Decomposition::RealScalar rcond_invmatrix_L1_norm_estimate(const Decomp
old_v_max_abs_index = v_max_abs_index;
old_lower_bound = lower_bound;
}
// The following calculates an independent estimate of ||matrix||_1 by
// multiplying matrix by a vector with entries of slowly increasing
// magnitude and alternating sign:
// v_i = (-1)^{i} (1 + (i / (dim-1))), i = 0,...,dim-1.
// This improvement to Hager's algorithm above is due to Higham. It was
// added to make the algorithm more robust in certain corner cases where
// large elements in the matrix might otherwise escape detection due to
// exact cancellation (especially when op and op_adjoint correspond to a
// sequence of backsubstitutions and permutations), which could cause
// Hager's algorithm to vastly underestimate ||matrix||_1.
// Higham's alternating-sign estimate: an independent safety-net that catches
// cases where the gradient ascent converges to a local maximum due to exact
// cancellation patterns (especially with permutations and backsubstitutions).
// v_i = (-1)^i * (1 + i/(n-1)), then estimate = 2*||A^{-1}*v||_1 / (3*n).
Scalar alternating_sign(RealScalar(1));
for (Index i = 0; i < n; ++i) {
// The static_cast is needed when Scalar is a complex and RealScalar implements expression templates
// The static_cast is needed when Scalar is complex and RealScalar uses expression templates.
v[i] = alternating_sign * static_cast<RealScalar>(RealScalar(1) + (RealScalar(i) / (RealScalar(n - 1))));
alternating_sign = -alternating_sign;
}
v = dec.solve(v);
const RealScalar alternate_lower_bound = (2 * v.template lpNorm<1>()) / (3 * RealScalar(n));
return numext::maxi(lower_bound, alternate_lower_bound);
const RealScalar alt_est = (RealScalar(2) * v.template lpNorm<1>()) / (RealScalar(3) * RealScalar(n));
return numext::maxi(lower_bound, alt_est);
}
/** \brief Reciprocal condition number estimator.
*
* Computing a decomposition of a dense matrix takes O(n^3) operations, while
* this method estimates the condition number quickly and reliably in O(n^2)
* operations.
*
* \returns an estimate of the reciprocal condition number
* (1 / (||matrix||_1 * ||inv(matrix)||_1)) of matrix, given ||matrix||_1 and
* its decomposition. Supports the following decompositions: FullPivLU,
* PartialPivLU, LDLT, and LLT.
*
* \sa FullPivLU, PartialPivLU, LDLT, LLT.
*/
*
* Computing a decomposition of a dense matrix takes O(n^3) operations, while
* this method estimates the condition number quickly and reliably in O(n^2)
* operations.
*
* \returns an estimate of the reciprocal condition number
* (1 / (||matrix||_1 * ||inv(matrix)||_1)) of matrix, given ||matrix||_1 and
* its decomposition. Supports the following decompositions: FullPivLU,
* PartialPivLU, LDLT, and LLT.
*
* \sa FullPivLU, PartialPivLU, LDLT, LLT.
*/
template <typename Decomposition>
typename Decomposition::RealScalar
rcond_estimate_helper(typename Decomposition::RealScalar matrix_norm, const Decomposition& dec)
{
typename Decomposition::RealScalar rcond_estimate_helper(typename Decomposition::RealScalar matrix_norm,
const Decomposition& dec) {
typedef typename Decomposition::RealScalar RealScalar;
eigen_assert(dec.rows() == dec.cols());
if (dec.rows() == 0) return RealScalar(1);
if (matrix_norm == RealScalar(0)) return RealScalar(0);
if (dec.rows() == 1) return RealScalar(1);
if (dec.rows() == 0) return NumTraits<RealScalar>::infinity();
if (numext::is_exactly_zero(matrix_norm)) return RealScalar(0);
if (dec.rows() == 1) return RealScalar(1);
const RealScalar inverse_matrix_norm = rcond_invmatrix_L1_norm_estimate(dec);
return (inverse_matrix_norm == RealScalar(0) ? RealScalar(0)
: (RealScalar(1) / inverse_matrix_norm) / matrix_norm);
return (numext::is_exactly_zero(inverse_matrix_norm) ? RealScalar(0)
: (RealScalar(1) / inverse_matrix_norm) / matrix_norm);
}
} // namespace internal

File diff suppressed because it is too large Load Diff

View File

@@ -10,95 +10,111 @@
#ifndef EIGEN_COREITERATORS_H
#define EIGEN_COREITERATORS_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/* This file contains the respective InnerIterator definition of the expressions defined in Eigen/Core
*/
namespace internal {
template<typename XprType, typename EvaluatorKind>
template <typename XprType, typename EvaluatorKind>
class inner_iterator_selector;
}
/** \class InnerIterator
* \brief An InnerIterator allows to loop over the element of any matrix expression.
*
* \warning To be used with care because an evaluator is constructed every time an InnerIterator iterator is constructed.
*
* TODO: add a usage example
*/
template<typename XprType>
class InnerIterator
{
protected:
* \brief An InnerIterator allows to loop over the element of any matrix expression.
*
* \warning To be used with care because an evaluator is constructed every time an InnerIterator iterator is
* constructed.
*
* TODO: add a usage example
*/
template <typename XprType>
class InnerIterator {
protected:
typedef internal::inner_iterator_selector<XprType, typename internal::evaluator_traits<XprType>::Kind> IteratorType;
typedef internal::evaluator<XprType> EvaluatorType;
typedef typename internal::traits<XprType>::Scalar Scalar;
public:
public:
/** Construct an iterator over the \a outerId -th row or column of \a xpr */
InnerIterator(const XprType &xpr, const Index &outerId)
: m_eval(xpr), m_iter(m_eval, outerId, xpr.innerSize())
{}
InnerIterator(const XprType &xpr, const Index &outerId) : m_eval(xpr), m_iter(m_eval, outerId, xpr.innerSize()) {}
/// \returns the value of the current coefficient.
EIGEN_STRONG_INLINE Scalar value() const { return m_iter.value(); }
EIGEN_STRONG_INLINE Scalar value() const { return m_iter.value(); }
/** Increment the iterator \c *this to the next non-zero coefficient.
* Explicit zeros are not skipped over. To skip explicit zeros, see class SparseView
*/
EIGEN_STRONG_INLINE InnerIterator& operator++() { m_iter.operator++(); return *this; }
* Explicit zeros are not skipped over. To skip explicit zeros, see class SparseView
*/
EIGEN_STRONG_INLINE InnerIterator &operator++() {
m_iter.operator++();
return *this;
}
EIGEN_STRONG_INLINE InnerIterator &operator+=(Index i) {
m_iter.operator+=(i);
return *this;
}
EIGEN_STRONG_INLINE InnerIterator operator+(Index i) const {
InnerIterator result(*this);
result += i;
return result;
}
/// \returns the column or row index of the current coefficient.
EIGEN_STRONG_INLINE Index index() const { return m_iter.index(); }
EIGEN_STRONG_INLINE Index index() const { return m_iter.index(); }
/// \returns the row index of the current coefficient.
EIGEN_STRONG_INLINE Index row() const { return m_iter.row(); }
EIGEN_STRONG_INLINE Index row() const { return m_iter.row(); }
/// \returns the column index of the current coefficient.
EIGEN_STRONG_INLINE Index col() const { return m_iter.col(); }
EIGEN_STRONG_INLINE Index col() const { return m_iter.col(); }
/// \returns \c true if the iterator \c *this still references a valid coefficient.
EIGEN_STRONG_INLINE operator bool() const { return m_iter; }
protected:
EIGEN_STRONG_INLINE operator bool() const { return m_iter; }
protected:
EvaluatorType m_eval;
IteratorType m_iter;
private:
private:
// If you get here, then you're not using the right InnerIterator type, e.g.:
// SparseMatrix<double,RowMajor> A;
// SparseMatrix<double>::InnerIterator it(A,0);
template<typename T> InnerIterator(const EigenBase<T>&,Index outer);
template <typename T>
InnerIterator(const EigenBase<T> &, Index outer);
};
namespace internal {
// Generic inner iterator implementation for dense objects
template<typename XprType>
class inner_iterator_selector<XprType, IndexBased>
{
protected:
template <typename XprType>
class inner_iterator_selector<XprType, IndexBased> {
protected:
typedef evaluator<XprType> EvaluatorType;
typedef typename traits<XprType>::Scalar Scalar;
enum { IsRowMajor = (XprType::Flags&RowMajorBit)==RowMajorBit };
public:
EIGEN_STRONG_INLINE inner_iterator_selector(const EvaluatorType &eval, const Index &outerId, const Index &innerSize)
: m_eval(eval), m_inner(0), m_outer(outerId), m_end(innerSize)
{}
enum { IsRowMajor = (XprType::Flags & RowMajorBit) == RowMajorBit };
EIGEN_STRONG_INLINE Scalar value() const
{
return (IsRowMajor) ? m_eval.coeff(m_outer, m_inner)
: m_eval.coeff(m_inner, m_outer);
public:
EIGEN_STRONG_INLINE inner_iterator_selector(const EvaluatorType &eval, const Index &outerId, const Index &innerSize)
: m_eval(eval), m_inner(0), m_outer(outerId), m_end(innerSize) {}
EIGEN_STRONG_INLINE Scalar value() const {
return (IsRowMajor) ? m_eval.coeff(m_outer, m_inner) : m_eval.coeff(m_inner, m_outer);
}
EIGEN_STRONG_INLINE inner_iterator_selector& operator++() { m_inner++; return *this; }
EIGEN_STRONG_INLINE inner_iterator_selector &operator++() {
m_inner++;
return *this;
}
EIGEN_STRONG_INLINE Index index() const { return m_inner; }
inline Index row() const { return IsRowMajor ? m_outer : index(); }
inline Index col() const { return IsRowMajor ? index() : m_outer; }
EIGEN_STRONG_INLINE operator bool() const { return m_inner < m_end && m_inner>=0; }
EIGEN_STRONG_INLINE operator bool() const { return m_inner < m_end && m_inner >= 0; }
protected:
const EvaluatorType& m_eval;
protected:
const EvaluatorType &m_eval;
Index m_inner;
const Index m_outer;
const Index m_end;
@@ -106,22 +122,20 @@ protected:
// For iterator-based evaluator, inner-iterator is already implemented as
// evaluator<>::InnerIterator
template<typename XprType>
class inner_iterator_selector<XprType, IteratorBased>
: public evaluator<XprType>::InnerIterator
{
protected:
template <typename XprType>
class inner_iterator_selector<XprType, IteratorBased> : public evaluator<XprType>::InnerIterator {
protected:
typedef typename evaluator<XprType>::InnerIterator Base;
typedef evaluator<XprType> EvaluatorType;
public:
EIGEN_STRONG_INLINE inner_iterator_selector(const EvaluatorType &eval, const Index &outerId, const Index &/*innerSize*/)
: Base(eval, outerId)
{}
public:
EIGEN_STRONG_INLINE inner_iterator_selector(const EvaluatorType &eval, const Index &outerId,
const Index & /*innerSize*/)
: Base(eval, outerId) {}
};
} // end namespace internal
} // end namespace internal
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_COREITERATORS_H
#endif // EIGEN_COREITERATORS_H

View File

@@ -11,15 +11,17 @@
#ifndef EIGEN_CWISE_BINARY_OP_H
#define EIGEN_CWISE_BINARY_OP_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename BinaryOp, typename Lhs, typename Rhs>
struct traits<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
{
template <typename BinaryOp, typename Lhs, typename Rhs>
struct traits<CwiseBinaryOp<BinaryOp, Lhs, Rhs>> {
// we must not inherit from traits<Lhs> since it has
// the potential to cause problems with MSVC
typedef typename remove_all<Lhs>::type Ancestor;
typedef remove_all_t<Lhs> Ancestor;
typedef typename traits<Ancestor>::XprKind XprKind;
enum {
RowsAtCompileTime = traits<Ancestor>::RowsAtCompileTime,
@@ -30,154 +32,135 @@ struct traits<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
// even though we require Lhs and Rhs to have the same scalar type (see CwiseBinaryOp constructor),
// we still want to handle the case when the result type is different.
typedef typename result_of<
BinaryOp(
const typename Lhs::Scalar&,
const typename Rhs::Scalar&
)
>::type Scalar;
typedef typename cwise_promote_storage_type<typename traits<Lhs>::StorageKind,
typename traits<Rhs>::StorageKind,
typedef typename result_of<BinaryOp(const typename Lhs::Scalar&, const typename Rhs::Scalar&)>::type Scalar;
typedef typename cwise_promote_storage_type<typename traits<Lhs>::StorageKind, typename traits<Rhs>::StorageKind,
BinaryOp>::ret StorageKind;
typedef typename promote_index_type<typename traits<Lhs>::StorageIndex,
typename traits<Rhs>::StorageIndex>::type StorageIndex;
typedef typename promote_index_type<typename traits<Lhs>::StorageIndex, typename traits<Rhs>::StorageIndex>::type
StorageIndex;
typedef typename Lhs::Nested LhsNested;
typedef typename Rhs::Nested RhsNested;
typedef typename remove_reference<LhsNested>::type _LhsNested;
typedef typename remove_reference<RhsNested>::type _RhsNested;
typedef std::remove_reference_t<LhsNested> LhsNested_;
typedef std::remove_reference_t<RhsNested> RhsNested_;
enum {
Flags = _LhsNested::Flags & RowMajorBit
Flags = cwise_promote_storage_order<typename traits<Lhs>::StorageKind, typename traits<Rhs>::StorageKind,
LhsNested_::Flags & RowMajorBit, RhsNested_::Flags & RowMajorBit>::value
};
};
} // end namespace internal
} // end namespace internal
template<typename BinaryOp, typename Lhs, typename Rhs, typename StorageKind>
template <typename BinaryOp, typename Lhs, typename Rhs, typename StorageKind>
class CwiseBinaryOpImpl;
/** \class CwiseBinaryOp
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise binary operator is applied to two expressions
*
* \tparam BinaryOp template functor implementing the operator
* \tparam LhsType the type of the left-hand side
* \tparam RhsType the type of the right-hand side
*
* This class represents an expression where a coefficient-wise binary operator is applied to two expressions.
* It is the return type of binary operators, by which we mean only those binary operators where
* both the left-hand side and the right-hand side are Eigen expressions.
* For example, the return type of matrix1+matrix2 is a CwiseBinaryOp.
*
* Most of the time, this is the only way that it is used, so you typically don't have to name
* CwiseBinaryOp types explicitly.
*
* \sa MatrixBase::binaryExpr(const MatrixBase<OtherDerived> &,const CustomBinaryOp &) const, class CwiseUnaryOp, class CwiseNullaryOp
*/
template<typename BinaryOp, typename LhsType, typename RhsType>
class CwiseBinaryOp :
public CwiseBinaryOpImpl<
BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<typename internal::traits<LhsType>::StorageKind,
typename internal::traits<RhsType>::StorageKind,
BinaryOp>::ret>,
internal::no_assignment_operator
{
public:
typedef typename internal::remove_all<LhsType>::type Lhs;
typedef typename internal::remove_all<RhsType>::type Rhs;
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise binary operator is applied to two expressions
*
* \tparam BinaryOp template functor implementing the operator
* \tparam LhsType the type of the left-hand side
* \tparam RhsType the type of the right-hand side
*
* This class represents an expression where a coefficient-wise binary operator is applied to two expressions.
* It is the return type of binary operators, by which we mean only those binary operators where
* both the left-hand side and the right-hand side are Eigen expressions.
* For example, the return type of matrix1+matrix2 is a CwiseBinaryOp.
*
* Most of the time, this is the only way that it is used, so you typically don't have to name
* CwiseBinaryOp types explicitly.
*
* \sa MatrixBase::binaryExpr(const MatrixBase<OtherDerived> &,const CustomBinaryOp &) const, class CwiseUnaryOp, class
* CwiseNullaryOp
*/
template <typename BinaryOp, typename LhsType, typename RhsType>
class CwiseBinaryOp : public CwiseBinaryOpImpl<BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<
typename internal::traits<LhsType>::StorageKind,
typename internal::traits<RhsType>::StorageKind, BinaryOp>::ret>,
internal::no_assignment_operator {
public:
typedef internal::remove_all_t<BinaryOp> Functor;
typedef internal::remove_all_t<LhsType> Lhs;
typedef internal::remove_all_t<RhsType> Rhs;
typedef typename CwiseBinaryOpImpl<
BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<typename internal::traits<LhsType>::StorageKind,
typename internal::traits<Rhs>::StorageKind,
BinaryOp>::ret>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseBinaryOp)
typedef typename CwiseBinaryOpImpl<
BinaryOp, LhsType, RhsType,
typename internal::cwise_promote_storage_type<typename internal::traits<LhsType>::StorageKind,
typename internal::traits<Rhs>::StorageKind, BinaryOp>::ret>::Base
Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseBinaryOp)
typedef typename internal::ref_selector<LhsType>::type LhsNested;
typedef typename internal::ref_selector<RhsType>::type RhsNested;
typedef typename internal::remove_reference<LhsNested>::type _LhsNested;
typedef typename internal::remove_reference<RhsNested>::type _RhsNested;
EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp, typename Lhs::Scalar, typename Rhs::Scalar)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Lhs, Rhs)
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp())
: m_lhs(aLhs), m_rhs(aRhs), m_functor(func)
{
EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp,typename Lhs::Scalar,typename Rhs::Scalar);
// require the sizes to match
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Lhs, Rhs)
eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
}
typedef typename internal::ref_selector<LhsType>::type LhsNested;
typedef typename internal::ref_selector<RhsType>::type RhsNested;
typedef std::remove_reference_t<LhsNested> LhsNested_;
typedef std::remove_reference_t<RhsNested> RhsNested_;
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic)
return m_rhs.rows();
else
return m_lhs.rows();
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const {
// return the fixed size type if available to enable compile time optimizations
if (internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic)
return m_rhs.cols();
else
return m_lhs.cols();
}
#if EIGEN_COMP_MSVC
// Required for Visual Studio, which may fail to inline the copy constructor otherwise.
EIGEN_STRONG_INLINE CwiseBinaryOp(const CwiseBinaryOp<BinaryOp, LhsType, RhsType>&) = default;
#endif
/** \returns the left hand side nested expression */
EIGEN_DEVICE_FUNC
const _LhsNested& lhs() const { return m_lhs; }
/** \returns the right hand side nested expression */
EIGEN_DEVICE_FUNC
const _RhsNested& rhs() const { return m_rhs; }
/** \returns the functor representing the binary operation */
EIGEN_DEVICE_FUNC
const BinaryOp& functor() const { return m_functor; }
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs,
const BinaryOp& func = BinaryOp())
: m_lhs(aLhs), m_rhs(aRhs), m_functor(func) {
eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
}
protected:
LhsNested m_lhs;
RhsNested m_rhs;
const BinaryOp m_functor;
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept {
// return the fixed size type if available to enable compile time optimizations
return internal::traits<internal::remove_all_t<LhsNested>>::RowsAtCompileTime == Dynamic ? m_rhs.rows()
: m_lhs.rows();
}
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept {
// return the fixed size type if available to enable compile time optimizations
return internal::traits<internal::remove_all_t<LhsNested>>::ColsAtCompileTime == Dynamic ? m_rhs.cols()
: m_lhs.cols();
}
/** \returns the left hand side nested expression */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE const LhsNested_& lhs() const { return m_lhs; }
/** \returns the right hand side nested expression */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE const RhsNested_& rhs() const { return m_rhs; }
/** \returns the functor representing the binary operation */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE const BinaryOp& functor() const { return m_functor; }
protected:
LhsNested m_lhs;
RhsNested m_rhs;
const BinaryOp m_functor;
};
// Generic API dispatcher
template<typename BinaryOp, typename Lhs, typename Rhs, typename StorageKind>
class CwiseBinaryOpImpl
: public internal::generic_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >::type
{
public:
typedef typename internal::generic_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >::type Base;
template <typename BinaryOp, typename Lhs, typename Rhs, typename StorageKind>
class CwiseBinaryOpImpl : public internal::generic_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs>>::type {
public:
typedef typename internal::generic_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs>>::type Base;
};
/** replaces \c *this by \c *this - \a other.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
MatrixBase<Derived>::operator-=(const MatrixBase<OtherDerived> &other)
{
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar,typename OtherDerived::Scalar>());
*
* \returns a reference to \c *this
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Derived& MatrixBase<Derived>::operator-=(const MatrixBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
/** replaces \c *this by \c *this + \a other.
*
* \returns a reference to \c *this
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_STRONG_INLINE Derived &
MatrixBase<Derived>::operator+=(const MatrixBase<OtherDerived>& other)
{
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar,typename OtherDerived::Scalar>());
*
* \returns a reference to \c *this
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Derived& MatrixBase<Derived>::operator+=(const MatrixBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
} // end namespace Eigen
#endif // EIGEN_CWISE_BINARY_OP_H
} // end namespace Eigen
#endif // EIGEN_CWISE_BINARY_OP_H

File diff suppressed because it is too large Load Diff

View File

@@ -12,14 +12,17 @@
#ifndef EIGEN_CWISE_TERNARY_OP_H
#define EIGEN_CWISE_TERNARY_OP_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template <typename TernaryOp, typename Arg1, typename Arg2, typename Arg3>
struct traits<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> > {
struct traits<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>> {
// we must not inherit from traits<Arg1> since it has
// the potential to cause problems with MSVC
typedef typename remove_all<Arg1>::type Ancestor;
typedef remove_all_t<Arg1> Ancestor;
typedef typename traits<Ancestor>::XprKind XprKind;
enum {
RowsAtCompileTime = traits<Ancestor>::RowsAtCompileTime,
@@ -31,9 +34,8 @@ struct traits<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> > {
// even though we require Arg1, Arg2, and Arg3 to have the same scalar type
// (see CwiseTernaryOp constructor),
// we still want to handle the case when the result type is different.
typedef typename result_of<TernaryOp(
const typename Arg1::Scalar&, const typename Arg2::Scalar&,
const typename Arg3::Scalar&)>::type Scalar;
typedef typename result_of<TernaryOp(const typename Arg1::Scalar&, const typename Arg2::Scalar&,
const typename Arg3::Scalar&)>::type Scalar;
typedef typename internal::traits<Arg1>::StorageKind StorageKind;
typedef typename internal::traits<Arg1>::StorageIndex StorageIndex;
@@ -41,138 +43,114 @@ struct traits<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> > {
typedef typename Arg1::Nested Arg1Nested;
typedef typename Arg2::Nested Arg2Nested;
typedef typename Arg3::Nested Arg3Nested;
typedef typename remove_reference<Arg1Nested>::type _Arg1Nested;
typedef typename remove_reference<Arg2Nested>::type _Arg2Nested;
typedef typename remove_reference<Arg3Nested>::type _Arg3Nested;
enum { Flags = _Arg1Nested::Flags & RowMajorBit };
typedef std::remove_reference_t<Arg1Nested> Arg1Nested_;
typedef std::remove_reference_t<Arg2Nested> Arg2Nested_;
typedef std::remove_reference_t<Arg3Nested> Arg3Nested_;
enum { Flags = Arg1Nested_::Flags & RowMajorBit };
};
} // end namespace internal
template <typename TernaryOp, typename Arg1, typename Arg2, typename Arg3,
typename StorageKind>
template <typename TernaryOp, typename Arg1, typename Arg2, typename Arg3, typename StorageKind>
class CwiseTernaryOpImpl;
/** \class CwiseTernaryOp
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise ternary operator is
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise ternary operator is
* applied to two expressions
*
* \tparam TernaryOp template functor implementing the operator
* \tparam Arg1Type the type of the first argument
* \tparam Arg2Type the type of the second argument
* \tparam Arg3Type the type of the third argument
*
* This class represents an expression where a coefficient-wise ternary
*
* \tparam TernaryOp template functor implementing the operator
* \tparam Arg1Type the type of the first argument
* \tparam Arg2Type the type of the second argument
* \tparam Arg3Type the type of the third argument
*
* This class represents an expression where a coefficient-wise ternary
* operator is applied to three expressions.
* It is the return type of ternary operators, by which we mean only those
* It is the return type of ternary operators, by which we mean only those
* ternary operators where
* all three arguments are Eigen expressions.
* For example, the return type of betainc(matrix1, matrix2, matrix3) is a
* all three arguments are Eigen expressions.
* For example, the return type of betainc(matrix1, matrix2, matrix3) is a
* CwiseTernaryOp.
*
* Most of the time, this is the only way that it is used, so you typically
*
* Most of the time, this is the only way that it is used, so you typically
* don't have to name
* CwiseTernaryOp types explicitly.
*
* \sa MatrixBase::ternaryExpr(const MatrixBase<Argument2> &, const
* CwiseTernaryOp types explicitly.
*
* \sa MatrixBase::ternaryExpr(const MatrixBase<Argument2> &, const
* MatrixBase<Argument3> &, const CustomTernaryOp &) const, class CwiseBinaryOp,
* class CwiseUnaryOp, class CwiseNullaryOp
*/
template <typename TernaryOp, typename Arg1Type, typename Arg2Type,
typename Arg3Type>
class CwiseTernaryOp : public CwiseTernaryOpImpl<
TernaryOp, Arg1Type, Arg2Type, Arg3Type,
typename internal::traits<Arg1Type>::StorageKind>,
internal::no_assignment_operator
{
*/
template <typename TernaryOp, typename Arg1Type, typename Arg2Type, typename Arg3Type>
class CwiseTernaryOp : public CwiseTernaryOpImpl<TernaryOp, Arg1Type, Arg2Type, Arg3Type,
typename internal::traits<Arg1Type>::StorageKind>,
internal::no_assignment_operator {
public:
typedef typename internal::remove_all<Arg1Type>::type Arg1;
typedef typename internal::remove_all<Arg2Type>::type Arg2;
typedef typename internal::remove_all<Arg3Type>::type Arg3;
typedef internal::remove_all_t<Arg1Type> Arg1;
typedef internal::remove_all_t<Arg2Type> Arg2;
typedef internal::remove_all_t<Arg3Type> Arg3;
typedef typename CwiseTernaryOpImpl<
TernaryOp, Arg1Type, Arg2Type, Arg3Type,
typename internal::traits<Arg1Type>::StorageKind>::Base Base;
// require the sizes to match
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg2)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg3)
// The index types should match
EIGEN_STATIC_ASSERT((internal::is_same<typename internal::traits<Arg1Type>::StorageKind,
typename internal::traits<Arg2Type>::StorageKind>::value),
STORAGE_KIND_MUST_MATCH)
EIGEN_STATIC_ASSERT((internal::is_same<typename internal::traits<Arg1Type>::StorageKind,
typename internal::traits<Arg3Type>::StorageKind>::value),
STORAGE_KIND_MUST_MATCH)
typedef typename CwiseTernaryOpImpl<TernaryOp, Arg1Type, Arg2Type, Arg3Type,
typename internal::traits<Arg1Type>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseTernaryOp)
typedef typename internal::ref_selector<Arg1Type>::type Arg1Nested;
typedef typename internal::ref_selector<Arg2Type>::type Arg2Nested;
typedef typename internal::ref_selector<Arg3Type>::type Arg3Nested;
typedef typename internal::remove_reference<Arg1Nested>::type _Arg1Nested;
typedef typename internal::remove_reference<Arg2Nested>::type _Arg2Nested;
typedef typename internal::remove_reference<Arg3Nested>::type _Arg3Nested;
typedef std::remove_reference_t<Arg1Nested> Arg1Nested_;
typedef std::remove_reference_t<Arg2Nested> Arg2Nested_;
typedef std::remove_reference_t<Arg3Nested> Arg3Nested_;
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE CwiseTernaryOp(const Arg1& a1, const Arg2& a2,
const Arg3& a3,
const TernaryOp& func = TernaryOp())
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CwiseTernaryOp(const Arg1& a1, const Arg2& a2, const Arg3& a3,
const TernaryOp& func = TernaryOp())
: m_arg1(a1), m_arg2(a2), m_arg3(a3), m_functor(func) {
// require the sizes to match
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg2)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Arg1, Arg3)
// The index types should match
EIGEN_STATIC_ASSERT((internal::is_same<
typename internal::traits<Arg1Type>::StorageKind,
typename internal::traits<Arg2Type>::StorageKind>::value),
STORAGE_KIND_MUST_MATCH)
EIGEN_STATIC_ASSERT((internal::is_same<
typename internal::traits<Arg1Type>::StorageKind,
typename internal::traits<Arg3Type>::StorageKind>::value),
STORAGE_KIND_MUST_MATCH)
eigen_assert(a1.rows() == a2.rows() && a1.cols() == a2.cols() &&
a1.rows() == a3.rows() && a1.cols() == a3.cols());
eigen_assert(a1.rows() == a2.rows() && a1.cols() == a2.cols() && a1.rows() == a3.rows() && a1.cols() == a3.cols());
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rows() const {
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Index rows() const {
// return the fixed size type if available to enable compile time
// optimizations
if (internal::traits<typename internal::remove_all<Arg1Nested>::type>::
RowsAtCompileTime == Dynamic &&
internal::traits<typename internal::remove_all<Arg2Nested>::type>::
RowsAtCompileTime == Dynamic)
if (internal::traits<internal::remove_all_t<Arg1Nested>>::RowsAtCompileTime == Dynamic &&
internal::traits<internal::remove_all_t<Arg2Nested>>::RowsAtCompileTime == Dynamic)
return m_arg3.rows();
else if (internal::traits<typename internal::remove_all<Arg1Nested>::type>::
RowsAtCompileTime == Dynamic &&
internal::traits<typename internal::remove_all<Arg3Nested>::type>::
RowsAtCompileTime == Dynamic)
else if (internal::traits<internal::remove_all_t<Arg1Nested>>::RowsAtCompileTime == Dynamic &&
internal::traits<internal::remove_all_t<Arg3Nested>>::RowsAtCompileTime == Dynamic)
return m_arg2.rows();
else
return m_arg1.rows();
}
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index cols() const {
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Index cols() const {
// return the fixed size type if available to enable compile time
// optimizations
if (internal::traits<typename internal::remove_all<Arg1Nested>::type>::
ColsAtCompileTime == Dynamic &&
internal::traits<typename internal::remove_all<Arg2Nested>::type>::
ColsAtCompileTime == Dynamic)
if (internal::traits<internal::remove_all_t<Arg1Nested>>::ColsAtCompileTime == Dynamic &&
internal::traits<internal::remove_all_t<Arg2Nested>>::ColsAtCompileTime == Dynamic)
return m_arg3.cols();
else if (internal::traits<typename internal::remove_all<Arg1Nested>::type>::
ColsAtCompileTime == Dynamic &&
internal::traits<typename internal::remove_all<Arg3Nested>::type>::
ColsAtCompileTime == Dynamic)
else if (internal::traits<internal::remove_all_t<Arg1Nested>>::ColsAtCompileTime == Dynamic &&
internal::traits<internal::remove_all_t<Arg3Nested>>::ColsAtCompileTime == Dynamic)
return m_arg2.cols();
else
return m_arg1.cols();
}
/** \returns the first argument nested expression */
EIGEN_DEVICE_FUNC
const _Arg1Nested& arg1() const { return m_arg1; }
EIGEN_DEVICE_FUNC constexpr const Arg1Nested_& arg1() const { return m_arg1; }
/** \returns the first argument nested expression */
EIGEN_DEVICE_FUNC
const _Arg2Nested& arg2() const { return m_arg2; }
EIGEN_DEVICE_FUNC constexpr const Arg2Nested_& arg2() const { return m_arg2; }
/** \returns the third argument nested expression */
EIGEN_DEVICE_FUNC
const _Arg3Nested& arg3() const { return m_arg3; }
EIGEN_DEVICE_FUNC constexpr const Arg3Nested_& arg3() const { return m_arg3; }
/** \returns the functor representing the ternary operation */
EIGEN_DEVICE_FUNC
const TernaryOp& functor() const { return m_functor; }
EIGEN_DEVICE_FUNC constexpr const TernaryOp& functor() const { return m_functor; }
protected:
Arg1Nested m_arg1;
@@ -182,14 +160,10 @@ class CwiseTernaryOp : public CwiseTernaryOpImpl<
};
// Generic API dispatcher
template <typename TernaryOp, typename Arg1, typename Arg2, typename Arg3,
typename StorageKind>
class CwiseTernaryOpImpl
: public internal::generic_xpr_base<
CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >::type {
template <typename TernaryOp, typename Arg1, typename Arg2, typename Arg3, typename StorageKind>
class CwiseTernaryOpImpl : public internal::generic_xpr_base<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>>::type {
public:
typedef typename internal::generic_xpr_base<
CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3> >::type Base;
typedef typename internal::generic_xpr_base<CwiseTernaryOp<TernaryOp, Arg1, Arg2, Arg3>>::type Base;
};
} // end namespace Eigen

View File

@@ -11,93 +11,85 @@
#ifndef EIGEN_CWISE_UNARY_OP_H
#define EIGEN_CWISE_UNARY_OP_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename UnaryOp, typename XprType>
struct traits<CwiseUnaryOp<UnaryOp, XprType> >
: traits<XprType>
{
typedef typename result_of<
UnaryOp(const typename XprType::Scalar&)
>::type Scalar;
template <typename UnaryOp, typename XprType>
struct traits<CwiseUnaryOp<UnaryOp, XprType> > : traits<XprType> {
typedef typename result_of<UnaryOp(const typename XprType::Scalar&)>::type Scalar;
typedef typename XprType::Nested XprTypeNested;
typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
enum {
Flags = _XprTypeNested::Flags & RowMajorBit
};
typedef std::remove_reference_t<XprTypeNested> XprTypeNested_;
enum { Flags = XprTypeNested_::Flags & RowMajorBit };
};
}
} // namespace internal
template<typename UnaryOp, typename XprType, typename StorageKind>
template <typename UnaryOp, typename XprType, typename StorageKind>
class CwiseUnaryOpImpl;
/** \class CwiseUnaryOp
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise unary operator is applied to an expression
*
* \tparam UnaryOp template functor implementing the operator
* \tparam XprType the type of the expression to which we are applying the unary operator
*
* This class represents an expression where a unary operator is applied to an expression.
* It is the return type of all operations taking exactly 1 input expression, regardless of the
* presence of other inputs such as scalars. For example, the operator* in the expression 3*matrix
* is considered unary, because only the right-hand side is an expression, and its
* return type is a specialization of CwiseUnaryOp.
*
* Most of the time, this is the only way that it is used, so you typically don't have to name
* CwiseUnaryOp types explicitly.
*
* \sa MatrixBase::unaryExpr(const CustomUnaryOp &) const, class CwiseBinaryOp, class CwiseNullaryOp
*/
template<typename UnaryOp, typename XprType>
class CwiseUnaryOp : public CwiseUnaryOpImpl<UnaryOp, XprType, typename internal::traits<XprType>::StorageKind>, internal::no_assignment_operator
{
public:
* \ingroup Core_Module
*
* \brief Generic expression where a coefficient-wise unary operator is applied to an expression
*
* \tparam UnaryOp template functor implementing the operator
* \tparam XprType the type of the expression to which we are applying the unary operator
*
* This class represents an expression where a unary operator is applied to an expression.
* It is the return type of all operations taking exactly 1 input expression, regardless of the
* presence of other inputs such as scalars. For example, the operator* in the expression 3*matrix
* is considered unary, because only the right-hand side is an expression, and its
* return type is a specialization of CwiseUnaryOp.
*
* Most of the time, this is the only way that it is used, so you typically don't have to name
* CwiseUnaryOp types explicitly.
*
* \sa MatrixBase::unaryExpr(const CustomUnaryOp &) const, class CwiseBinaryOp, class CwiseNullaryOp
*/
template <typename UnaryOp, typename XprType>
class CwiseUnaryOp : public CwiseUnaryOpImpl<UnaryOp, XprType, typename internal::traits<XprType>::StorageKind>,
internal::no_assignment_operator {
public:
typedef typename CwiseUnaryOpImpl<UnaryOp, XprType, typename internal::traits<XprType>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryOp)
typedef typename internal::ref_selector<XprType>::type XprTypeNested;
typedef internal::remove_all_t<XprType> NestedExpression;
typedef typename CwiseUnaryOpImpl<UnaryOp, XprType,typename internal::traits<XprType>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryOp)
typedef typename internal::ref_selector<XprType>::type XprTypeNested;
typedef typename internal::remove_all<XprType>::type NestedExpression;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
explicit CwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp())
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE explicit CwiseUnaryOp(const XprType& xpr,
const UnaryOp& func = UnaryOp())
: m_xpr(xpr), m_functor(func) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index rows() const { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
Index cols() const { return m_xpr.cols(); }
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return m_xpr.rows(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return m_xpr.cols(); }
/** \returns the functor representing the unary operation */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const UnaryOp& functor() const { return m_functor; }
/** \returns the functor representing the unary operation */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE const UnaryOp& functor() const { return m_functor; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
const typename internal::remove_all<XprTypeNested>::type&
nestedExpression() const { return m_xpr; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE const internal::remove_all_t<XprTypeNested>& nestedExpression()
const {
return m_xpr;
}
/** \returns the nested expression */
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
typename internal::remove_all<XprTypeNested>::type&
nestedExpression() { return m_xpr; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE internal::remove_all_t<XprTypeNested>& nestedExpression() {
return m_xpr;
}
protected:
XprTypeNested m_xpr;
const UnaryOp m_functor;
protected:
XprTypeNested m_xpr;
const UnaryOp m_functor;
};
// Generic API dispatcher
template<typename UnaryOp, typename XprType, typename StorageKind>
class CwiseUnaryOpImpl
: public internal::generic_xpr_base<CwiseUnaryOp<UnaryOp, XprType> >::type
{
public:
template <typename UnaryOp, typename XprType, typename StorageKind>
class CwiseUnaryOpImpl : public internal::generic_xpr_base<CwiseUnaryOp<UnaryOp, XprType> >::type {
public:
typedef typename internal::generic_xpr_base<CwiseUnaryOp<UnaryOp, XprType> >::type Base;
};
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_CWISE_UNARY_OP_H
#endif // EIGEN_CWISE_UNARY_OP_H

View File

@@ -10,119 +10,160 @@
#ifndef EIGEN_CWISE_UNARY_VIEW_H
#define EIGEN_CWISE_UNARY_VIEW_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template<typename ViewOp, typename MatrixType>
struct traits<CwiseUnaryView<ViewOp, MatrixType> >
: traits<MatrixType>
{
typedef typename result_of<
ViewOp(const typename traits<MatrixType>::Scalar&)
>::type Scalar;
template <typename ViewOp, typename MatrixType, typename StrideType>
struct traits<CwiseUnaryView<ViewOp, MatrixType, StrideType> > : traits<MatrixType> {
typedef typename result_of<ViewOp(typename traits<MatrixType>::Scalar&)>::type1 ScalarRef;
static_assert(std::is_reference<ScalarRef>::value, "Views must return a reference type.");
typedef remove_all_t<ScalarRef> Scalar;
typedef typename MatrixType::Nested MatrixTypeNested;
typedef typename remove_all<MatrixTypeNested>::type _MatrixTypeNested;
typedef remove_all_t<MatrixTypeNested> MatrixTypeNested_;
enum {
FlagsLvalueBit = is_lvalue<MatrixType>::value ? LvalueBit : 0,
Flags = traits<_MatrixTypeNested>::Flags & (RowMajorBit | FlagsLvalueBit | DirectAccessBit), // FIXME DirectAccessBit should not be handled by expressions
MatrixTypeInnerStride = inner_stride_at_compile_time<MatrixType>::ret,
Flags =
traits<MatrixTypeNested_>::Flags &
(RowMajorBit | FlagsLvalueBit | DirectAccessBit), // FIXME DirectAccessBit should not be handled by expressions
MatrixTypeInnerStride = inner_stride_at_compile_time<MatrixType>::ret,
// need to cast the sizeof's from size_t to int explicitly, otherwise:
// "error: no integral type can represent all of the enumerator values
InnerStrideAtCompileTime = MatrixTypeInnerStride == Dynamic
? int(Dynamic)
: int(MatrixTypeInnerStride) * int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar)),
OuterStrideAtCompileTime = outer_stride_at_compile_time<MatrixType>::ret == Dynamic
? int(Dynamic)
: outer_stride_at_compile_time<MatrixType>::ret * int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar))
InnerStrideAtCompileTime =
StrideType::InnerStrideAtCompileTime == 0
? (MatrixTypeInnerStride == Dynamic
? int(Dynamic)
: int(MatrixTypeInnerStride) * int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar)))
: int(StrideType::InnerStrideAtCompileTime),
OuterStrideAtCompileTime = StrideType::OuterStrideAtCompileTime == 0
? (outer_stride_at_compile_time<MatrixType>::ret == Dynamic
? int(Dynamic)
: outer_stride_at_compile_time<MatrixType>::ret *
int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar)))
: int(StrideType::OuterStrideAtCompileTime)
};
};
}
template<typename ViewOp, typename MatrixType, typename StorageKind>
class CwiseUnaryViewImpl;
/** \class CwiseUnaryView
* \ingroup Core_Module
*
* \brief Generic lvalue expression of a coefficient-wise unary operator of a matrix or a vector
*
* \tparam ViewOp template functor implementing the view
* \tparam MatrixType the type of the matrix we are applying the unary operator
*
* This class represents a lvalue expression of a generic unary view operator of a matrix or a vector.
* It is the return type of real() and imag(), and most of the time this is the only way it is used.
*
* \sa MatrixBase::unaryViewExpr(const CustomUnaryOp &) const, class CwiseUnaryOp
*/
template<typename ViewOp, typename MatrixType>
class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename internal::traits<MatrixType>::StorageKind>
{
public:
typedef typename CwiseUnaryViewImpl<ViewOp, MatrixType,typename internal::traits<MatrixType>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryView)
typedef typename internal::ref_selector<MatrixType>::non_const_type MatrixTypeNested;
typedef typename internal::remove_all<MatrixType>::type NestedExpression;
explicit inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
/** \returns the functor representing unary operation */
const ViewOp& functor() const { return m_functor; }
/** \returns the nested expression */
const typename internal::remove_all<MatrixTypeNested>::type&
nestedExpression() const { return m_matrix; }
/** \returns the nested expression */
typename internal::remove_reference<MatrixTypeNested>::type&
nestedExpression() { return m_matrix.const_cast_derived(); }
protected:
MatrixTypeNested m_matrix;
ViewOp m_functor;
};
// Generic API dispatcher
template<typename ViewOp, typename XprType, typename StorageKind>
class CwiseUnaryViewImpl
: public internal::generic_xpr_base<CwiseUnaryView<ViewOp, XprType> >::type
{
public:
typedef typename internal::generic_xpr_base<CwiseUnaryView<ViewOp, XprType> >::type Base;
template <typename ViewOp, typename XprType, typename StrideType, typename StorageKind,
bool Mutable = !std::is_const<XprType>::value>
class CwiseUnaryViewImpl : public generic_xpr_base<CwiseUnaryView<ViewOp, XprType, StrideType> >::type {
public:
typedef typename generic_xpr_base<CwiseUnaryView<ViewOp, XprType, StrideType> >::type Base;
};
template<typename ViewOp, typename MatrixType>
class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
: public internal::dense_xpr_base< CwiseUnaryView<ViewOp, MatrixType> >::type
{
public:
template <typename ViewOp, typename MatrixType, typename StrideType>
class CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType, Dense, false>
: public dense_xpr_base<CwiseUnaryView<ViewOp, MatrixType, StrideType> >::type {
public:
typedef CwiseUnaryView<ViewOp, MatrixType, StrideType> Derived;
typedef typename dense_xpr_base<CwiseUnaryView<ViewOp, MatrixType, StrideType> >::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
typedef CwiseUnaryView<ViewOp, MatrixType> Derived;
typedef typename internal::dense_xpr_base< CwiseUnaryView<ViewOp, MatrixType> >::type Base;
EIGEN_DEVICE_FUNC inline const Scalar* data() const { return &(this->coeffRef(0)); }
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
EIGEN_DEVICE_FUNC inline Scalar* data() { return &(this->coeffRef(0)); }
EIGEN_DEVICE_FUNC inline const Scalar* data() const { return &(this->coeff(0)); }
EIGEN_DEVICE_FUNC constexpr Index innerStride() const {
return StrideType::InnerStrideAtCompileTime != 0 ? int(StrideType::InnerStrideAtCompileTime)
: derived().nestedExpression().innerStride() *
sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar);
}
EIGEN_DEVICE_FUNC inline Index innerStride() const
{
return derived().nestedExpression().innerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
EIGEN_DEVICE_FUNC constexpr Index outerStride() const {
return StrideType::OuterStrideAtCompileTime != 0 ? int(StrideType::OuterStrideAtCompileTime)
: derived().nestedExpression().outerStride() *
sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar);
}
EIGEN_DEVICE_FUNC inline Index outerStride() const
{
return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
// Allow const access to coeffRef for the case of direct access being enabled.
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index index) const {
return internal::evaluator<Derived>(derived()).coeffRef(index);
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index row, Index col) const {
return internal::evaluator<Derived>(derived()).coeffRef(row, col);
}
};
} // end namespace Eigen
template <typename ViewOp, typename MatrixType, typename StrideType>
class CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType, Dense, true>
: public CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType, Dense, false> {
public:
typedef CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType, Dense, false> Base;
typedef CwiseUnaryView<ViewOp, MatrixType, StrideType> Derived;
EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
#endif // EIGEN_CWISE_UNARY_VIEW_H
using Base::data;
EIGEN_DEVICE_FUNC inline Scalar* data() { return &(this->coeffRef(0)); }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col) {
return internal::evaluator<Derived>(derived()).coeffRef(row, col);
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) {
return internal::evaluator<Derived>(derived()).coeffRef(index);
}
protected:
EIGEN_DEFAULT_EMPTY_CONSTRUCTOR_AND_DESTRUCTOR(CwiseUnaryViewImpl)
};
} // namespace internal
/** \class CwiseUnaryView
* \ingroup Core_Module
*
* \brief Generic lvalue expression of a coefficient-wise unary operator of a matrix or a vector
*
* \tparam ViewOp template functor implementing the view
* \tparam MatrixType the type of the matrix we are applying the unary operator
*
* This class represents a lvalue expression of a generic unary view operator of a matrix or a vector.
* It is the return type of real() and imag(), and most of the time this is the only way it is used.
*
* \sa MatrixBase::unaryViewExpr(const CustomUnaryOp &) const, class CwiseUnaryOp
*/
template <typename ViewOp, typename MatrixType, typename StrideType>
class CwiseUnaryView : public internal::CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType,
typename internal::traits<MatrixType>::StorageKind> {
public:
typedef typename internal::CwiseUnaryViewImpl<ViewOp, MatrixType, StrideType,
typename internal::traits<MatrixType>::StorageKind>::Base Base;
EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryView)
typedef typename internal::ref_selector<MatrixType>::non_const_type MatrixTypeNested;
typedef internal::remove_all_t<MatrixType> NestedExpression;
explicit EIGEN_DEVICE_FUNC constexpr inline CwiseUnaryView(MatrixType& mat, const ViewOp& func = ViewOp())
: m_matrix(mat), m_functor(func) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return m_matrix.rows(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return m_matrix.cols(); }
/** \returns the functor representing unary operation */
EIGEN_DEVICE_FUNC constexpr const ViewOp& functor() const { return m_functor; }
/** \returns the nested expression */
EIGEN_DEVICE_FUNC constexpr const internal::remove_all_t<MatrixTypeNested>& nestedExpression() const {
return m_matrix;
}
/** \returns the nested expression */
EIGEN_DEVICE_FUNC constexpr std::remove_reference_t<MatrixTypeNested>& nestedExpression() { return m_matrix; }
protected:
MatrixTypeNested m_matrix;
ViewOp m_functor;
};
} // namespace Eigen
#endif // EIGEN_CWISE_UNARY_VIEW_H

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,153 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2023 Charlie Schlosser <cs.schlosser@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_DEVICEWRAPPER_H
#define EIGEN_DEVICEWRAPPER_H
namespace Eigen {
template <typename Derived, typename Device>
struct DeviceWrapper {
using Base = EigenBase<internal::remove_all_t<Derived>>;
using Scalar = typename Derived::Scalar;
EIGEN_DEVICE_FUNC DeviceWrapper(Base& xpr, Device& device) : m_xpr(xpr.derived()), m_device(device) {}
EIGEN_DEVICE_FUNC DeviceWrapper(const Base& xpr, Device& device) : m_xpr(xpr.derived()), m_device(device) {}
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const EigenBase<OtherDerived>& other) {
using AssignOp = internal::assign_op<Scalar, typename OtherDerived::Scalar>;
internal::call_assignment(*this, other.derived(), AssignOp());
return m_xpr;
}
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator+=(const EigenBase<OtherDerived>& other) {
using AddAssignOp = internal::add_assign_op<Scalar, typename OtherDerived::Scalar>;
internal::call_assignment(*this, other.derived(), AddAssignOp());
return m_xpr;
}
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator-=(const EigenBase<OtherDerived>& other) {
using SubAssignOp = internal::sub_assign_op<Scalar, typename OtherDerived::Scalar>;
internal::call_assignment(*this, other.derived(), SubAssignOp());
return m_xpr;
}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& derived() { return m_xpr; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Device& device() { return m_device; }
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE NoAlias<DeviceWrapper, EigenBase> noalias() {
return NoAlias<DeviceWrapper, EigenBase>(*this);
}
Derived& m_xpr;
Device& m_device;
};
namespace internal {
// this is where we differentiate between lazy assignment and specialized kernels (e.g. matrix products)
template <typename DstXprType, typename SrcXprType, typename Functor, typename Device,
typename Kind = typename AssignmentKind<typename evaluator_traits<DstXprType>::Shape,
typename evaluator_traits<SrcXprType>::Shape>::Kind,
typename EnableIf = void>
struct AssignmentWithDevice;
// unless otherwise specified, use the default product implementation
template <typename DstXprType, typename Lhs, typename Rhs, int Options, typename Functor, typename Device,
typename Weak>
struct AssignmentWithDevice<DstXprType, Product<Lhs, Rhs, Options>, Functor, Device, Dense2Dense, Weak> {
using SrcXprType = Product<Lhs, Rhs, Options>;
using Base = Assignment<DstXprType, SrcXprType, Functor>;
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(DstXprType& dst, const SrcXprType& src, const Functor& func,
Device&) {
Base::run(dst, src, func);
}
};
// specialization for coeffcient-wise assignment
template <typename DstXprType, typename SrcXprType, typename Functor, typename Device, typename Weak>
struct AssignmentWithDevice<DstXprType, SrcXprType, Functor, Device, Dense2Dense, Weak> {
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(DstXprType& dst, const SrcXprType& src, const Functor& func,
Device& device) {
#ifndef EIGEN_NO_DEBUG
internal::check_for_aliasing(dst, src);
#endif
call_dense_assignment_loop(dst, src, func, device);
}
};
// this allows us to use the default evaluation scheme if it is not specialized for the device
template <typename Kernel, typename Device, int Traversal = Kernel::AssignmentTraits::Traversal,
int Unrolling = Kernel::AssignmentTraits::Unrolling>
struct dense_assignment_loop_with_device {
using Base = dense_assignment_loop<Kernel, Traversal, Unrolling>;
static EIGEN_DEVICE_FUNC constexpr void run(Kernel& kernel, Device&) { Base::run(kernel); }
};
// entry point for a generic expression with device
template <typename Dst, typename Src, typename Func, typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE constexpr void call_assignment_no_alias(DeviceWrapper<Dst, Device> dst,
const Src& src, const Func& func) {
enum {
NeedToTranspose = ((int(Dst::RowsAtCompileTime) == 1 && int(Src::ColsAtCompileTime) == 1) ||
(int(Dst::ColsAtCompileTime) == 1 && int(Src::RowsAtCompileTime) == 1)) &&
int(Dst::SizeAtCompileTime) != 1
};
using ActualDstTypeCleaned = std::conditional_t<NeedToTranspose, Transpose<Dst>, Dst>;
using ActualDstType = std::conditional_t<NeedToTranspose, Transpose<Dst>, Dst&>;
ActualDstType actualDst(dst.derived());
// TODO: check whether this is the right place to perform these checks:
EIGEN_STATIC_ASSERT_LVALUE(Dst)
EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(ActualDstTypeCleaned, Src)
EIGEN_CHECK_BINARY_COMPATIBILIY(Func, typename ActualDstTypeCleaned::Scalar, typename Src::Scalar);
// this provides a mechanism for specializing simple assignments, matrix products, etc
AssignmentWithDevice<ActualDstTypeCleaned, Src, Func, Device>::run(actualDst, src, func, dst.device());
}
// copy and pasted from AssignEvaluator except forward device to kernel
template <typename DstXprType, typename SrcXprType, typename Functor, typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE constexpr void call_dense_assignment_loop(DstXprType& dst, const SrcXprType& src,
const Functor& func, Device& device) {
using DstEvaluatorType = evaluator<DstXprType>;
using SrcEvaluatorType = evaluator<SrcXprType>;
SrcEvaluatorType srcEvaluator(src);
// NOTE To properly handle A = (A*A.transpose())/s with A rectangular,
// we need to resize the destination after the source evaluator has been created.
resize_if_allowed(dst, src, func);
DstEvaluatorType dstEvaluator(dst);
using Kernel = generic_dense_assignment_kernel<DstEvaluatorType, SrcEvaluatorType, Functor>;
Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
dense_assignment_loop_with_device<Kernel, Device>::run(kernel, device);
}
} // namespace internal
template <typename Derived>
template <typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DeviceWrapper<Derived, Device> EigenBase<Derived>::device(Device& device) {
return DeviceWrapper<Derived, Device>(derived(), device);
}
template <typename Derived>
template <typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DeviceWrapper<const Derived, Device> EigenBase<Derived>::device(
Device& device) const {
return DeviceWrapper<const Derived, Device>(derived(), device);
}
} // namespace Eigen
#endif

View File

@@ -11,247 +11,211 @@
#ifndef EIGEN_DIAGONAL_H
#define EIGEN_DIAGONAL_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \class Diagonal
* \ingroup Core_Module
*
* \brief Expression of a diagonal/subdiagonal/superdiagonal in a matrix
*
* \param MatrixType the type of the object in which we are taking a sub/main/super diagonal
* \param DiagIndex the index of the sub/super diagonal. The default is 0 and it means the main diagonal.
* A positive value means a superdiagonal, a negative value means a subdiagonal.
* You can also use Dynamic so the index can be set at runtime.
*
* The matrix is not required to be square.
*
* This class represents an expression of the main diagonal, or any sub/super diagonal
* of a square matrix. It is the return type of MatrixBase::diagonal() and MatrixBase::diagonal(Index) and most of the
* time this is the only way it is used.
*
* \sa MatrixBase::diagonal(), MatrixBase::diagonal(Index)
*/
* \ingroup Core_Module
*
* \brief Expression of a diagonal/subdiagonal/superdiagonal in a matrix
*
* \tparam MatrixType the type of the object in which we are taking a sub/main/super diagonal
* \tparam DiagIndex the index of the sub/super diagonal. The default is 0 and it means the main diagonal.
* A positive value means a superdiagonal, a negative value means a subdiagonal.
* You can also use DynamicIndex so the index can be set at runtime.
*
* The matrix is not required to be square.
*
* This class represents an expression of the main diagonal, or any sub/super diagonal
* of a square matrix. It is the return type of MatrixBase::diagonal() and MatrixBase::diagonal(Index) and most of the
* time this is the only way it is used.
*
* \sa MatrixBase::diagonal(), MatrixBase::diagonal(Index)
*/
namespace internal {
template<typename MatrixType, int DiagIndex>
struct traits<Diagonal<MatrixType,DiagIndex> >
: traits<MatrixType>
{
template <typename MatrixType, int DiagIndex>
struct traits<Diagonal<MatrixType, DiagIndex> > : traits<MatrixType> {
typedef typename ref_selector<MatrixType>::type MatrixTypeNested;
typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
typedef std::remove_reference_t<MatrixTypeNested> MatrixTypeNested_;
typedef typename MatrixType::StorageKind StorageKind;
enum {
RowsAtCompileTime = (int(DiagIndex) == DynamicIndex || int(MatrixType::SizeAtCompileTime) == Dynamic) ? Dynamic
: (EIGEN_PLAIN_ENUM_MIN(MatrixType::RowsAtCompileTime - EIGEN_PLAIN_ENUM_MAX(-DiagIndex, 0),
MatrixType::ColsAtCompileTime - EIGEN_PLAIN_ENUM_MAX( DiagIndex, 0))),
RowsAtCompileTime = (int(DiagIndex) == DynamicIndex || int(MatrixType::SizeAtCompileTime) == Dynamic)
? Dynamic
: (plain_enum_min(MatrixType::RowsAtCompileTime - plain_enum_max(-DiagIndex, 0),
MatrixType::ColsAtCompileTime - plain_enum_max(DiagIndex, 0))),
ColsAtCompileTime = 1,
MaxRowsAtCompileTime = int(MatrixType::MaxSizeAtCompileTime) == Dynamic ? Dynamic
: DiagIndex == DynamicIndex ? EIGEN_SIZE_MIN_PREFER_FIXED(MatrixType::MaxRowsAtCompileTime,
MatrixType::MaxColsAtCompileTime)
: (EIGEN_PLAIN_ENUM_MIN(MatrixType::MaxRowsAtCompileTime - EIGEN_PLAIN_ENUM_MAX(-DiagIndex, 0),
MatrixType::MaxColsAtCompileTime - EIGEN_PLAIN_ENUM_MAX( DiagIndex, 0))),
MaxRowsAtCompileTime =
int(MatrixType::MaxSizeAtCompileTime) == Dynamic ? Dynamic
: DiagIndex == DynamicIndex
? min_size_prefer_fixed(MatrixType::MaxRowsAtCompileTime, MatrixType::MaxColsAtCompileTime)
: (plain_enum_min(MatrixType::MaxRowsAtCompileTime - plain_enum_max(-DiagIndex, 0),
MatrixType::MaxColsAtCompileTime - plain_enum_max(DiagIndex, 0))),
MaxColsAtCompileTime = 1,
MaskLvalueBit = is_lvalue<MatrixType>::value ? LvalueBit : 0,
Flags = (unsigned int)_MatrixTypeNested::Flags & (RowMajorBit | MaskLvalueBit | DirectAccessBit) & ~RowMajorBit, // FIXME DirectAccessBit should not be handled by expressions
Flags = (unsigned int)MatrixTypeNested_::Flags & (RowMajorBit | MaskLvalueBit | DirectAccessBit) &
~RowMajorBit, // FIXME DirectAccessBit should not be handled by expressions
MatrixTypeOuterStride = outer_stride_at_compile_time<MatrixType>::ret,
InnerStrideAtCompileTime = MatrixTypeOuterStride == Dynamic ? Dynamic : MatrixTypeOuterStride+1,
InnerStrideAtCompileTime = MatrixTypeOuterStride == Dynamic ? Dynamic : MatrixTypeOuterStride + 1,
OuterStrideAtCompileTime = 0
};
};
}
} // namespace internal
template<typename MatrixType, int _DiagIndex> class Diagonal
: public internal::dense_xpr_base< Diagonal<MatrixType,_DiagIndex> >::type
{
public:
template <typename MatrixType, int DiagIndex_>
class Diagonal : public internal::dense_xpr_base<Diagonal<MatrixType, DiagIndex_> >::type {
public:
enum { DiagIndex = DiagIndex_ };
typedef typename internal::dense_xpr_base<Diagonal>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Diagonal)
enum { DiagIndex = _DiagIndex };
typedef typename internal::dense_xpr_base<Diagonal>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(Diagonal)
EIGEN_DEVICE_FUNC constexpr explicit inline Diagonal(MatrixType& matrix, Index a_index = DiagIndex)
: m_matrix(matrix), m_index(a_index) {
eigen_assert(a_index <= m_matrix.cols() && -a_index <= m_matrix.rows());
}
EIGEN_DEVICE_FUNC
explicit inline Diagonal(MatrixType& matrix, Index a_index = DiagIndex) : m_matrix(matrix), m_index(a_index) {}
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Diagonal)
EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Diagonal)
EIGEN_DEVICE_FUNC constexpr inline Index rows() const {
return m_index.value() < 0 ? numext::mini<Index>(m_matrix.cols(), m_matrix.rows() + m_index.value())
: numext::mini<Index>(m_matrix.rows(), m_matrix.cols() - m_index.value());
}
EIGEN_DEVICE_FUNC
inline Index rows() const
{
return m_index.value()<0 ? numext::mini<Index>(m_matrix.cols(),m_matrix.rows()+m_index.value())
: numext::mini<Index>(m_matrix.rows(),m_matrix.cols()-m_index.value());
}
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return 1; }
EIGEN_DEVICE_FUNC
inline Index cols() const { return 1; }
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept { return m_matrix.outerStride() + 1; }
EIGEN_DEVICE_FUNC
inline Index innerStride() const
{
return m_matrix.outerStride() + 1;
}
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept { return 0; }
EIGEN_DEVICE_FUNC
inline Index outerStride() const
{
return 0;
}
typedef std::conditional_t<internal::is_lvalue<MatrixType>::value, Scalar, const Scalar> ScalarWithConstIfNotLvalue;
typedef typename internal::conditional<
internal::is_lvalue<MatrixType>::value,
Scalar,
const Scalar
>::type ScalarWithConstIfNotLvalue;
EIGEN_DEVICE_FUNC inline ScalarWithConstIfNotLvalue* data() {
return rows() > 0 ? &(m_matrix.coeffRef(rowOffset(), colOffset())) : nullptr;
}
EIGEN_DEVICE_FUNC inline const Scalar* data() const {
return rows() > 0 ? &(m_matrix.coeffRef(rowOffset(), colOffset())) : nullptr;
}
EIGEN_DEVICE_FUNC
inline ScalarWithConstIfNotLvalue* data() { return &(m_matrix.coeffRef(rowOffset(), colOffset())); }
EIGEN_DEVICE_FUNC
inline const Scalar* data() const { return &(m_matrix.coeffRef(rowOffset(), colOffset())); }
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index) {
EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
return m_matrix.coeffRef(row + rowOffset(), row + colOffset());
}
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index row, Index)
{
EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
return m_matrix.coeffRef(row+rowOffset(), row+colOffset());
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index row, Index) const {
return m_matrix.coeffRef(row + rowOffset(), row + colOffset());
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index row, Index) const
{
return m_matrix.coeffRef(row+rowOffset(), row+colOffset());
}
EIGEN_DEVICE_FUNC inline CoeffReturnType coeff(Index row, Index) const {
return m_matrix.coeff(row + rowOffset(), row + colOffset());
}
EIGEN_DEVICE_FUNC
inline CoeffReturnType coeff(Index row, Index) const
{
return m_matrix.coeff(row+rowOffset(), row+colOffset());
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index idx) {
EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
return m_matrix.coeffRef(idx + rowOffset(), idx + colOffset());
}
EIGEN_DEVICE_FUNC
inline Scalar& coeffRef(Index idx)
{
EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
return m_matrix.coeffRef(idx+rowOffset(), idx+colOffset());
}
EIGEN_DEVICE_FUNC inline const Scalar& coeffRef(Index idx) const {
return m_matrix.coeffRef(idx + rowOffset(), idx + colOffset());
}
EIGEN_DEVICE_FUNC
inline const Scalar& coeffRef(Index idx) const
{
return m_matrix.coeffRef(idx+rowOffset(), idx+colOffset());
}
EIGEN_DEVICE_FUNC inline CoeffReturnType coeff(Index idx) const {
return m_matrix.coeff(idx + rowOffset(), idx + colOffset());
}
EIGEN_DEVICE_FUNC
inline CoeffReturnType coeff(Index idx) const
{
return m_matrix.coeff(idx+rowOffset(), idx+colOffset());
}
EIGEN_DEVICE_FUNC constexpr inline const internal::remove_all_t<typename MatrixType::Nested>& nestedExpression()
const {
return m_matrix;
}
EIGEN_DEVICE_FUNC
inline const typename internal::remove_all<typename MatrixType::Nested>::type&
nestedExpression() const
{
return m_matrix;
}
EIGEN_DEVICE_FUNC constexpr inline Index index() const { return m_index.value(); }
EIGEN_DEVICE_FUNC
inline Index index() const
{
return m_index.value();
}
protected:
typename internal::ref_selector<MatrixType>::non_const_type m_matrix;
const internal::variable_if_dynamicindex<Index, DiagIndex> m_index;
protected:
typename internal::ref_selector<MatrixType>::non_const_type m_matrix;
const internal::variable_if_dynamicindex<Index, DiagIndex> m_index;
private:
// some compilers may fail to optimize std::max etc in case of compile-time constants...
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index absDiagIndex() const { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value()>0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value()>0 ? m_index.value() : 0; }
// trigger a compile-time error if someone try to call packet
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index) const;
template<int LoadMode> typename MatrixType::PacketReturnType packet(Index,Index) const;
private:
// some compilers may fail to optimize std::max etc in case of compile-time constants...
EIGEN_DEVICE_FUNC constexpr Index absDiagIndex() const noexcept {
return m_index.value() > 0 ? m_index.value() : -m_index.value();
}
EIGEN_DEVICE_FUNC constexpr Index rowOffset() const noexcept { return m_index.value() > 0 ? 0 : -m_index.value(); }
EIGEN_DEVICE_FUNC constexpr Index colOffset() const noexcept { return m_index.value() > 0 ? m_index.value() : 0; }
// trigger a compile-time error if someone try to call packet
template <int LoadMode>
typename MatrixType::PacketReturnType packet(Index) const;
template <int LoadMode>
typename MatrixType::PacketReturnType packet(Index, Index) const;
};
/** \returns an expression of the main diagonal of the matrix \c *this
*
* \c *this is not required to be square.
*
* Example: \include MatrixBase_diagonal.cpp
* Output: \verbinclude MatrixBase_diagonal.out
*
* \sa class Diagonal */
template<typename Derived>
inline typename MatrixBase<Derived>::DiagonalReturnType
MatrixBase<Derived>::diagonal()
{
*
* \c *this is not required to be square.
*
* Example: \include MatrixBase_diagonal.cpp
* Output: \verbinclude MatrixBase_diagonal.out
*
* \sa class Diagonal */
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr typename MatrixBase<Derived>::DiagonalReturnType MatrixBase<Derived>::diagonal() {
return DiagonalReturnType(derived());
}
/** This is the const version of diagonal(). */
template<typename Derived>
inline typename MatrixBase<Derived>::ConstDiagonalReturnType
MatrixBase<Derived>::diagonal() const
{
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr const typename MatrixBase<Derived>::ConstDiagonalReturnType MatrixBase<Derived>::diagonal()
const {
return ConstDiagonalReturnType(derived());
}
/** \returns an expression of the \a DiagIndex-th sub or super diagonal of the matrix \c *this
*
* \c *this is not required to be square.
*
* The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
* and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
*
* Example: \include MatrixBase_diagonal_int.cpp
* Output: \verbinclude MatrixBase_diagonal_int.out
*
* \sa MatrixBase::diagonal(), class Diagonal */
template<typename Derived>
inline typename MatrixBase<Derived>::DiagonalDynamicIndexReturnType
MatrixBase<Derived>::diagonal(Index index)
{
return DiagonalDynamicIndexReturnType(derived(), index);
*
* \c *this is not required to be square.
*
* The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
* and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
*
* Example: \include MatrixBase_diagonal_int.cpp
* Output: \verbinclude MatrixBase_diagonal_int.out
*
* \sa MatrixBase::diagonal(), class Diagonal */
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr Diagonal<Derived, DynamicIndex> MatrixBase<Derived>::diagonal(Index index) {
return Diagonal<Derived, DynamicIndex>(derived(), index);
}
/** This is the const version of diagonal(Index). */
template<typename Derived>
inline typename MatrixBase<Derived>::ConstDiagonalDynamicIndexReturnType
MatrixBase<Derived>::diagonal(Index index) const
{
return ConstDiagonalDynamicIndexReturnType(derived(), index);
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr const Diagonal<const Derived, DynamicIndex> MatrixBase<Derived>::diagonal(
Index index) const {
return Diagonal<const Derived, DynamicIndex>(derived(), index);
}
/** \returns an expression of the \a DiagIndex-th sub or super diagonal of the matrix \c *this
*
* \c *this is not required to be square.
*
* The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
* and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
*
* Example: \include MatrixBase_diagonal_template_int.cpp
* Output: \verbinclude MatrixBase_diagonal_template_int.out
*
* \sa MatrixBase::diagonal(), class Diagonal */
template<typename Derived>
template<int Index_>
inline typename MatrixBase<Derived>::template DiagonalIndexReturnType<Index_>::Type
MatrixBase<Derived>::diagonal()
{
return typename DiagonalIndexReturnType<Index_>::Type(derived());
*
* \c *this is not required to be square.
*
* The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
* and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
*
* Example: \include MatrixBase_diagonal_template_int.cpp
* Output: \verbinclude MatrixBase_diagonal_template_int.out
*
* \sa MatrixBase::diagonal(), class Diagonal */
template <typename Derived>
template <int Index_>
EIGEN_DEVICE_FUNC constexpr Diagonal<Derived, Index_> MatrixBase<Derived>::diagonal() {
return Diagonal<Derived, Index_>(derived());
}
/** This is the const version of diagonal<int>(). */
template<typename Derived>
template<int Index_>
inline typename MatrixBase<Derived>::template ConstDiagonalIndexReturnType<Index_>::Type
MatrixBase<Derived>::diagonal() const
{
return typename ConstDiagonalIndexReturnType<Index_>::Type(derived());
template <typename Derived>
template <int Index_>
EIGEN_DEVICE_FUNC constexpr const Diagonal<const Derived, Index_> MatrixBase<Derived>::diagonal() const {
return Diagonal<const Derived, Index_>(derived());
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_DIAGONAL_H
#endif // EIGEN_DIAGONAL_H

View File

@@ -11,222 +11,300 @@
#ifndef EIGEN_DIAGONALMATRIX_H
#define EIGEN_DIAGONALMATRIX_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
#ifndef EIGEN_PARSED_BY_DOXYGEN
template<typename Derived>
class DiagonalBase : public EigenBase<Derived>
{
public:
typedef typename internal::traits<Derived>::DiagonalVectorType DiagonalVectorType;
typedef typename DiagonalVectorType::Scalar Scalar;
typedef typename DiagonalVectorType::RealScalar RealScalar;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
typedef typename internal::traits<Derived>::StorageIndex StorageIndex;
namespace Eigen {
enum {
RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
IsVectorAtCompileTime = 0,
Flags = NoPreferredStorageOrderBit
};
/** \class DiagonalBase
* \ingroup Core_Module
*
* \brief Base class for diagonal matrices and expressions
*
* This is the base class that is inherited by diagonal matrix and related expression
* types, which internally use a vector for storing the diagonal entries. Diagonal
* types always represent square matrices.
*
* \tparam Derived is the derived type, a DiagonalMatrix or DiagonalWrapper.
*
* \sa class DiagonalMatrix, class DiagonalWrapper
*/
template <typename Derived>
class DiagonalBase : public EigenBase<Derived> {
public:
typedef typename internal::traits<Derived>::DiagonalVectorType DiagonalVectorType;
typedef typename DiagonalVectorType::Scalar Scalar;
typedef typename DiagonalVectorType::RealScalar RealScalar;
typedef typename internal::traits<Derived>::StorageKind StorageKind;
typedef typename internal::traits<Derived>::StorageIndex StorageIndex;
typedef Matrix<Scalar, RowsAtCompileTime, ColsAtCompileTime, 0, MaxRowsAtCompileTime, MaxColsAtCompileTime> DenseMatrixType;
typedef DenseMatrixType DenseType;
typedef DiagonalMatrix<Scalar,DiagonalVectorType::SizeAtCompileTime,DiagonalVectorType::MaxSizeAtCompileTime> PlainObject;
enum {
RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
IsVectorAtCompileTime = 0,
Flags = NoPreferredStorageOrderBit
};
EIGEN_DEVICE_FUNC
inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
EIGEN_DEVICE_FUNC
inline Derived& derived() { return *static_cast<Derived*>(this); }
typedef Matrix<Scalar, RowsAtCompileTime, ColsAtCompileTime, 0, MaxRowsAtCompileTime, MaxColsAtCompileTime>
DenseMatrixType;
typedef DenseMatrixType DenseType;
typedef DiagonalMatrix<Scalar, DiagonalVectorType::SizeAtCompileTime, DiagonalVectorType::MaxSizeAtCompileTime>
PlainObject;
EIGEN_DEVICE_FUNC
DenseMatrixType toDenseMatrix() const { return derived(); }
EIGEN_DEVICE_FUNC
inline const DiagonalVectorType& diagonal() const { return derived().diagonal(); }
EIGEN_DEVICE_FUNC
inline DiagonalVectorType& diagonal() { return derived().diagonal(); }
/** \returns a reference to the derived object. */
EIGEN_DEVICE_FUNC inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
/** \returns a const reference to the derived object. */
EIGEN_DEVICE_FUNC inline Derived& derived() { return *static_cast<Derived*>(this); }
EIGEN_DEVICE_FUNC
inline Index rows() const { return diagonal().size(); }
EIGEN_DEVICE_FUNC
inline Index cols() const { return diagonal().size(); }
/**
* Constructs a dense matrix from \c *this. Note, this directly returns a dense matrix type,
* not an expression.
* \returns A dense matrix, with its diagonal entries set from the derived object. */
EIGEN_DEVICE_FUNC DenseMatrixType toDenseMatrix() const { return derived(); }
template<typename MatrixDerived>
EIGEN_DEVICE_FUNC
const Product<Derived,MatrixDerived,LazyProduct>
operator*(const MatrixBase<MatrixDerived> &matrix) const
{
return Product<Derived, MatrixDerived, LazyProduct>(derived(),matrix.derived());
}
/** \returns a reference to the derived object's vector of diagonal coefficients. */
EIGEN_DEVICE_FUNC inline const DiagonalVectorType& diagonal() const { return derived().diagonal(); }
/** \returns a const reference to the derived object's vector of diagonal coefficients. */
EIGEN_DEVICE_FUNC inline DiagonalVectorType& diagonal() { return derived().diagonal(); }
typedef DiagonalWrapper<const CwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const DiagonalVectorType> > InverseReturnType;
EIGEN_DEVICE_FUNC
inline const InverseReturnType
inverse() const
{
return InverseReturnType(diagonal().cwiseInverse());
}
EIGEN_DEVICE_FUNC
inline const DiagonalWrapper<const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(DiagonalVectorType,Scalar,product) >
operator*(const Scalar& scalar) const
{
return DiagonalWrapper<const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(DiagonalVectorType,Scalar,product) >(diagonal() * scalar);
}
EIGEN_DEVICE_FUNC
friend inline const DiagonalWrapper<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,DiagonalVectorType,product) >
operator*(const Scalar& scalar, const DiagonalBase& other)
{
return DiagonalWrapper<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,DiagonalVectorType,product) >(scalar * other.diagonal());
}
/** \returns the value of the coefficient as if \c *this was a dense matrix. */
EIGEN_DEVICE_FUNC inline Scalar coeff(Index row, Index col) const {
eigen_assert(row >= 0 && col >= 0 && row < rows() && col <= cols());
return row == col ? diagonal().coeff(row) : Scalar(0);
}
/** \returns the number of rows. */
EIGEN_DEVICE_FUNC constexpr Index rows() const { return diagonal().size(); }
/** \returns the number of columns. */
EIGEN_DEVICE_FUNC constexpr Index cols() const { return diagonal().size(); }
/** \returns the diagonal matrix product of \c *this by the dense matrix, \a matrix */
template <typename MatrixDerived>
EIGEN_DEVICE_FUNC const Product<Derived, MatrixDerived, LazyProduct> operator*(
const MatrixBase<MatrixDerived>& matrix) const {
return Product<Derived, MatrixDerived, LazyProduct>(derived(), matrix.derived());
}
template <typename OtherDerived>
using DiagonalProductReturnType = DiagonalWrapper<const EIGEN_CWISE_BINARY_RETURN_TYPE(
DiagonalVectorType, typename OtherDerived::DiagonalVectorType, product)>;
/** \returns the diagonal matrix product of \c *this by the diagonal matrix \a other */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC const DiagonalProductReturnType<OtherDerived> operator*(
const DiagonalBase<OtherDerived>& other) const {
return diagonal().cwiseProduct(other.diagonal()).asDiagonal();
}
using DiagonalInverseReturnType =
DiagonalWrapper<const CwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const DiagonalVectorType>>;
/** \returns the inverse \c *this. Computed as the coefficient-wise inverse of the diagonal. */
EIGEN_DEVICE_FUNC inline const DiagonalInverseReturnType inverse() const {
return diagonal().cwiseInverse().asDiagonal();
}
using DiagonalScaleReturnType =
DiagonalWrapper<const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(DiagonalVectorType, Scalar, product)>;
/** \returns the product of \c *this by the scalar \a scalar */
EIGEN_DEVICE_FUNC inline const DiagonalScaleReturnType operator*(const Scalar& scalar) const {
return (diagonal() * scalar).asDiagonal();
}
using ScaleDiagonalReturnType =
DiagonalWrapper<const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar, DiagonalVectorType, product)>;
/** \returns the product of a scalar and the diagonal matrix \a other */
EIGEN_DEVICE_FUNC friend inline const ScaleDiagonalReturnType operator*(const Scalar& scalar,
const DiagonalBase& other) {
return (scalar * other.diagonal()).asDiagonal();
}
template <typename OtherDerived>
using DiagonalSumReturnType = DiagonalWrapper<const EIGEN_CWISE_BINARY_RETURN_TYPE(
DiagonalVectorType, typename OtherDerived::DiagonalVectorType, sum)>;
/** \returns the sum of \c *this and the diagonal matrix \a other */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC inline const DiagonalSumReturnType<OtherDerived> operator+(
const DiagonalBase<OtherDerived>& other) const {
return (diagonal() + other.diagonal()).asDiagonal();
}
template <typename OtherDerived>
using DiagonalDifferenceReturnType = DiagonalWrapper<const EIGEN_CWISE_BINARY_RETURN_TYPE(
DiagonalVectorType, typename OtherDerived::DiagonalVectorType, difference)>;
/** \returns the difference of \c *this and the diagonal matrix \a other */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC inline const DiagonalDifferenceReturnType<OtherDerived> operator-(
const DiagonalBase<OtherDerived>& other) const {
return (diagonal() - other.diagonal()).asDiagonal();
}
};
#endif
/** \class DiagonalMatrix
* \ingroup Core_Module
*
* \brief Represents a diagonal matrix with its storage
*
* \param _Scalar the type of coefficients
* \param SizeAtCompileTime the dimension of the matrix, or Dynamic
* \param MaxSizeAtCompileTime the dimension of the matrix, or Dynamic. This parameter is optional and defaults
* to SizeAtCompileTime. Most of the time, you do not need to specify it.
*
* \sa class DiagonalWrapper
*/
* \ingroup Core_Module
*
* \brief Represents a diagonal matrix with its storage
*
* \tparam Scalar_ the type of coefficients
* \tparam SizeAtCompileTime the dimension of the matrix, or Dynamic
* \tparam MaxSizeAtCompileTime the dimension of the matrix, or Dynamic. This parameter is optional and defaults
* to SizeAtCompileTime. Most of the time, you do not need to specify it.
*
* \sa class DiagonalBase, class DiagonalWrapper
*/
namespace internal {
template<typename _Scalar, int SizeAtCompileTime, int MaxSizeAtCompileTime>
struct traits<DiagonalMatrix<_Scalar,SizeAtCompileTime,MaxSizeAtCompileTime> >
: traits<Matrix<_Scalar,SizeAtCompileTime,SizeAtCompileTime,0,MaxSizeAtCompileTime,MaxSizeAtCompileTime> >
{
typedef Matrix<_Scalar,SizeAtCompileTime,1,0,MaxSizeAtCompileTime,1> DiagonalVectorType;
template <typename Scalar_, int SizeAtCompileTime, int MaxSizeAtCompileTime>
struct traits<DiagonalMatrix<Scalar_, SizeAtCompileTime, MaxSizeAtCompileTime>>
: traits<Matrix<Scalar_, SizeAtCompileTime, SizeAtCompileTime, 0, MaxSizeAtCompileTime, MaxSizeAtCompileTime>> {
typedef Matrix<Scalar_, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1> DiagonalVectorType;
typedef DiagonalShape StorageKind;
enum {
Flags = LvalueBit | NoPreferredStorageOrderBit
};
enum { Flags = LvalueBit | NoPreferredStorageOrderBit | NestByRefBit };
};
}
template<typename _Scalar, int SizeAtCompileTime, int MaxSizeAtCompileTime>
class DiagonalMatrix
: public DiagonalBase<DiagonalMatrix<_Scalar,SizeAtCompileTime,MaxSizeAtCompileTime> >
{
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
typedef typename internal::traits<DiagonalMatrix>::DiagonalVectorType DiagonalVectorType;
typedef const DiagonalMatrix& Nested;
typedef _Scalar Scalar;
typedef typename internal::traits<DiagonalMatrix>::StorageKind StorageKind;
typedef typename internal::traits<DiagonalMatrix>::StorageIndex StorageIndex;
#endif
} // namespace internal
template <typename Scalar_, int SizeAtCompileTime, int MaxSizeAtCompileTime>
class DiagonalMatrix : public DiagonalBase<DiagonalMatrix<Scalar_, SizeAtCompileTime, MaxSizeAtCompileTime>> {
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
typedef typename internal::traits<DiagonalMatrix>::DiagonalVectorType DiagonalVectorType;
typedef const DiagonalMatrix& Nested;
typedef Scalar_ Scalar;
typedef typename internal::traits<DiagonalMatrix>::StorageKind StorageKind;
typedef typename internal::traits<DiagonalMatrix>::StorageIndex StorageIndex;
#endif
protected:
protected:
DiagonalVectorType m_diagonal;
DiagonalVectorType m_diagonal;
public:
/** const version of diagonal(). */
EIGEN_DEVICE_FUNC constexpr inline const DiagonalVectorType& diagonal() const { return m_diagonal; }
/** \returns a reference to the stored vector of diagonal coefficients. */
EIGEN_DEVICE_FUNC constexpr inline DiagonalVectorType& diagonal() { return m_diagonal; }
public:
/** Default constructor without initialization */
EIGEN_DEVICE_FUNC constexpr inline DiagonalMatrix() {}
/** const version of diagonal(). */
EIGEN_DEVICE_FUNC
inline const DiagonalVectorType& diagonal() const { return m_diagonal; }
/** \returns a reference to the stored vector of diagonal coefficients. */
EIGEN_DEVICE_FUNC
inline DiagonalVectorType& diagonal() { return m_diagonal; }
/** Constructs a diagonal matrix with given dimension */
EIGEN_DEVICE_FUNC constexpr explicit inline DiagonalMatrix(Index dim) : m_diagonal(dim) {}
/** Default constructor without initialization */
EIGEN_DEVICE_FUNC
inline DiagonalMatrix() {}
/** 2D constructor. */
EIGEN_DEVICE_FUNC constexpr inline DiagonalMatrix(const Scalar& x, const Scalar& y) : m_diagonal(x, y) {}
/** Constructs a diagonal matrix with given dimension */
EIGEN_DEVICE_FUNC
explicit inline DiagonalMatrix(Index dim) : m_diagonal(dim) {}
/** 3D constructor. */
EIGEN_DEVICE_FUNC constexpr inline DiagonalMatrix(const Scalar& x, const Scalar& y, const Scalar& z)
: m_diagonal(x, y, z) {}
/** 2D constructor. */
EIGEN_DEVICE_FUNC
inline DiagonalMatrix(const Scalar& x, const Scalar& y) : m_diagonal(x,y) {}
/** \brief Construct a diagonal matrix with fixed size from an arbitrary number of coefficients.
*
* \warning To construct a diagonal matrix of fixed size, the number of values passed to this
* constructor must match the fixed dimension of \c *this.
*
* \sa DiagonalMatrix(const Scalar&, const Scalar&)
* \sa DiagonalMatrix(const Scalar&, const Scalar&, const Scalar&)
*/
template <typename... ArgTypes>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE DiagonalMatrix(const Scalar& a0, const Scalar& a1, const Scalar& a2,
const ArgTypes&... args)
: m_diagonal(a0, a1, a2, args...) {}
/** 3D constructor. */
EIGEN_DEVICE_FUNC
inline DiagonalMatrix(const Scalar& x, const Scalar& y, const Scalar& z) : m_diagonal(x,y,z) {}
/** \brief Constructs a DiagonalMatrix and initializes it by elements given by an initializer list of initializer
* lists \cpp11
*/
EIGEN_DEVICE_FUNC explicit EIGEN_STRONG_INLINE DiagonalMatrix(
const std::initializer_list<std::initializer_list<Scalar>>& list)
: m_diagonal(list) {}
/** Copy constructor. */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
inline DiagonalMatrix(const DiagonalBase<OtherDerived>& other) : m_diagonal(other.diagonal()) {}
/** \brief Constructs a DiagonalMatrix from an r-value diagonal vector type */
EIGEN_DEVICE_FUNC constexpr explicit inline DiagonalMatrix(DiagonalVectorType&& diag) : m_diagonal(std::move(diag)) {}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** copy constructor. prevent a default copy constructor from hiding the other templated constructor */
inline DiagonalMatrix(const DiagonalMatrix& other) : m_diagonal(other.diagonal()) {}
#endif
/** Copy constructor. */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr inline DiagonalMatrix(const DiagonalBase<OtherDerived>& other)
: m_diagonal(other.diagonal()) {}
/** generic constructor from expression of the diagonal coefficients */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
explicit inline DiagonalMatrix(const MatrixBase<OtherDerived>& other) : m_diagonal(other)
{}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** copy constructor. prevent a default copy constructor from hiding the other templated constructor */
inline DiagonalMatrix(const DiagonalMatrix& other) : m_diagonal(other.diagonal()) {}
#endif
/** Copy operator. */
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
DiagonalMatrix& operator=(const DiagonalBase<OtherDerived>& other)
{
m_diagonal = other.diagonal();
return *this;
}
/** generic constructor from expression of the diagonal coefficients */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr explicit inline DiagonalMatrix(const MatrixBase<OtherDerived>& other)
: m_diagonal(other) {}
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
EIGEN_DEVICE_FUNC
DiagonalMatrix& operator=(const DiagonalMatrix& other)
{
m_diagonal = other.diagonal();
return *this;
}
#endif
/** Copy operator. */
template <typename OtherDerived>
EIGEN_DEVICE_FUNC DiagonalMatrix& operator=(const DiagonalBase<OtherDerived>& other) {
m_diagonal = other.diagonal();
return *this;
}
/** Resizes to given size. */
EIGEN_DEVICE_FUNC
inline void resize(Index size) { m_diagonal.resize(size); }
/** Sets all coefficients to zero. */
EIGEN_DEVICE_FUNC
inline void setZero() { m_diagonal.setZero(); }
/** Resizes and sets all coefficients to zero. */
EIGEN_DEVICE_FUNC
inline void setZero(Index size) { m_diagonal.setZero(size); }
/** Sets this matrix to be the identity matrix of the current size. */
EIGEN_DEVICE_FUNC
inline void setIdentity() { m_diagonal.setOnes(); }
/** Sets this matrix to be the identity matrix of the given size. */
EIGEN_DEVICE_FUNC
inline void setIdentity(Index size) { m_diagonal.setOnes(size); }
#ifndef EIGEN_PARSED_BY_DOXYGEN
/** This is a special case of the templated operator=. Its purpose is to
* prevent a default operator= from hiding the templated operator=.
*/
EIGEN_DEVICE_FUNC DiagonalMatrix& operator=(const DiagonalMatrix& other) {
m_diagonal = other.diagonal();
return *this;
}
#endif
typedef DiagonalWrapper<const CwiseNullaryOp<internal::scalar_constant_op<Scalar>, DiagonalVectorType>>
InitializeReturnType;
typedef DiagonalWrapper<const CwiseNullaryOp<internal::scalar_zero_op<Scalar>, DiagonalVectorType>>
ZeroInitializeReturnType;
/** Initializes a diagonal matrix of size SizeAtCompileTime with coefficients set to zero */
EIGEN_DEVICE_FUNC static const ZeroInitializeReturnType Zero() { return DiagonalVectorType::Zero().asDiagonal(); }
/** Initializes a diagonal matrix of size dim with coefficients set to zero */
EIGEN_DEVICE_FUNC static const ZeroInitializeReturnType Zero(Index size) {
return DiagonalVectorType::Zero(size).asDiagonal();
}
/** Initializes a identity matrix of size SizeAtCompileTime */
EIGEN_DEVICE_FUNC static const InitializeReturnType Identity() { return DiagonalVectorType::Ones().asDiagonal(); }
/** Initializes a identity matrix of size dim */
EIGEN_DEVICE_FUNC static const InitializeReturnType Identity(Index size) {
return DiagonalVectorType::Ones(size).asDiagonal();
}
/** Resizes to given size. */
EIGEN_DEVICE_FUNC inline void resize(Index size) { m_diagonal.resize(size); }
/** Sets all coefficients to zero. */
EIGEN_DEVICE_FUNC inline void setZero() { m_diagonal.setZero(); }
/** Resizes and sets all coefficients to zero. */
EIGEN_DEVICE_FUNC inline void setZero(Index size) { m_diagonal.setZero(size); }
/** Sets this matrix to be the identity matrix of the current size. */
EIGEN_DEVICE_FUNC inline void setIdentity() { m_diagonal.setOnes(); }
/** Sets this matrix to be the identity matrix of the given size. */
EIGEN_DEVICE_FUNC inline void setIdentity(Index size) { m_diagonal.setOnes(size); }
};
/** \class DiagonalWrapper
* \ingroup Core_Module
*
* \brief Expression of a diagonal matrix
*
* \param _DiagonalVectorType the type of the vector of diagonal coefficients
*
* This class is an expression of a diagonal matrix, but not storing its own vector of diagonal coefficients,
* instead wrapping an existing vector expression. It is the return type of MatrixBase::asDiagonal()
* and most of the time this is the only way that it is used.
*
* \sa class DiagonalMatrix, class DiagonalBase, MatrixBase::asDiagonal()
*/
* \ingroup Core_Module
*
* \brief Expression of a diagonal matrix
*
* \tparam DiagonalVectorType_ the type of the vector of diagonal coefficients
*
* This class is an expression of a diagonal matrix, but not storing its own vector of diagonal coefficients,
* instead wrapping an existing vector expression. It is the return type of MatrixBase::asDiagonal()
* and most of the time this is the only way that it is used.
*
* \sa class DiagonalMatrix, class DiagonalBase, MatrixBase::asDiagonal()
*/
namespace internal {
template<typename _DiagonalVectorType>
struct traits<DiagonalWrapper<_DiagonalVectorType> >
{
typedef _DiagonalVectorType DiagonalVectorType;
template <typename DiagonalVectorType_>
struct traits<DiagonalWrapper<DiagonalVectorType_>> {
typedef DiagonalVectorType_ DiagonalVectorType;
typedef typename DiagonalVectorType::Scalar Scalar;
typedef typename DiagonalVectorType::StorageIndex StorageIndex;
typedef DiagonalShape StorageKind;
@@ -236,108 +314,160 @@ struct traits<DiagonalWrapper<_DiagonalVectorType> >
ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
Flags = (traits<DiagonalVectorType>::Flags & LvalueBit) | NoPreferredStorageOrderBit
Flags = (traits<DiagonalVectorType>::Flags & LvalueBit) | NoPreferredStorageOrderBit
};
};
}
} // namespace internal
template<typename _DiagonalVectorType>
class DiagonalWrapper
: public DiagonalBase<DiagonalWrapper<_DiagonalVectorType> >, internal::no_assignment_operator
{
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
typedef _DiagonalVectorType DiagonalVectorType;
typedef DiagonalWrapper Nested;
#endif
template <typename DiagonalVectorType_>
class DiagonalWrapper : public DiagonalBase<DiagonalWrapper<DiagonalVectorType_>>, internal::no_assignment_operator {
public:
#ifndef EIGEN_PARSED_BY_DOXYGEN
typedef DiagonalVectorType_ DiagonalVectorType;
typedef DiagonalWrapper Nested;
#endif
/** Constructor from expression of diagonal coefficients to wrap. */
EIGEN_DEVICE_FUNC
explicit inline DiagonalWrapper(DiagonalVectorType& a_diagonal) : m_diagonal(a_diagonal) {}
/** Constructor from expression of diagonal coefficients to wrap. */
EIGEN_DEVICE_FUNC constexpr explicit inline DiagonalWrapper(DiagonalVectorType& a_diagonal)
: m_diagonal(a_diagonal) {}
/** \returns a const reference to the wrapped expression of diagonal coefficients. */
EIGEN_DEVICE_FUNC
const DiagonalVectorType& diagonal() const { return m_diagonal; }
/** \returns a const reference to the wrapped expression of diagonal coefficients. */
EIGEN_DEVICE_FUNC constexpr const DiagonalVectorType& diagonal() const { return m_diagonal; }
protected:
typename DiagonalVectorType::Nested m_diagonal;
protected:
typename DiagonalVectorType::Nested m_diagonal;
};
/** \returns a pseudo-expression of a diagonal matrix with *this as vector of diagonal coefficients
*
* \only_for_vectors
*
* Example: \include MatrixBase_asDiagonal.cpp
* Output: \verbinclude MatrixBase_asDiagonal.out
*
* \sa class DiagonalWrapper, class DiagonalMatrix, diagonal(), isDiagonal()
**/
template<typename Derived>
inline const DiagonalWrapper<const Derived>
MatrixBase<Derived>::asDiagonal() const
{
*
* \only_for_vectors
*
* Example: \include MatrixBase_asDiagonal.cpp
* Output: \verbinclude MatrixBase_asDiagonal.out
*
* \sa class DiagonalWrapper, class DiagonalMatrix, diagonal(), isDiagonal()
**/
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr const DiagonalWrapper<const Derived> MatrixBase<Derived>::asDiagonal() const {
return DiagonalWrapper<const Derived>(derived());
}
/** \returns true if *this is approximately equal to a diagonal matrix,
* within the precision given by \a prec.
*
* Example: \include MatrixBase_isDiagonal.cpp
* Output: \verbinclude MatrixBase_isDiagonal.out
*
* \sa asDiagonal()
*/
template<typename Derived>
bool MatrixBase<Derived>::isDiagonal(const RealScalar& prec) const
{
if(cols() != rows()) return false;
* within the precision given by \a prec.
*
* Example: \include MatrixBase_isDiagonal.cpp
* Output: \verbinclude MatrixBase_isDiagonal.out
*
* \sa asDiagonal()
*/
template <typename Derived>
bool MatrixBase<Derived>::isDiagonal(const RealScalar& prec) const {
if (cols() != rows()) return false;
RealScalar maxAbsOnDiagonal = static_cast<RealScalar>(-1);
for(Index j = 0; j < cols(); ++j)
{
RealScalar absOnDiagonal = numext::abs(coeff(j,j));
if(absOnDiagonal > maxAbsOnDiagonal) maxAbsOnDiagonal = absOnDiagonal;
for (Index j = 0; j < cols(); ++j) {
RealScalar absOnDiagonal = numext::abs(coeff(j, j));
if (absOnDiagonal > maxAbsOnDiagonal) maxAbsOnDiagonal = absOnDiagonal;
}
for(Index j = 0; j < cols(); ++j)
for(Index i = 0; i < j; ++i)
{
if(!internal::isMuchSmallerThan(coeff(i, j), maxAbsOnDiagonal, prec)) return false;
if(!internal::isMuchSmallerThan(coeff(j, i), maxAbsOnDiagonal, prec)) return false;
for (Index j = 0; j < cols(); ++j)
for (Index i = 0; i < j; ++i) {
if (!internal::isMuchSmallerThan(coeff(i, j), maxAbsOnDiagonal, prec)) return false;
if (!internal::isMuchSmallerThan(coeff(j, i), maxAbsOnDiagonal, prec)) return false;
}
return true;
}
/** \returns DiagonalWrapper.
*
* Example: \include MatrixBase_diagonalView.cpp
* Output: \verbinclude MatrixBase_diagonalView.out
*
* \sa diagonalView()
*/
/** This is the non-const version of diagonalView() with DiagIndex_ . */
template <typename Derived>
template <int DiagIndex_>
EIGEN_DEVICE_FUNC constexpr DiagonalWrapper<Diagonal<Derived, DiagIndex_>> MatrixBase<Derived>::diagonalView() {
typedef Diagonal<Derived, DiagIndex_> DiagType;
typedef DiagonalWrapper<DiagType> ReturnType;
DiagType diag(this->derived());
return ReturnType(diag);
}
/** This is the const version of diagonalView() with DiagIndex_ . */
template <typename Derived>
template <int DiagIndex_>
EIGEN_DEVICE_FUNC constexpr DiagonalWrapper<Diagonal<const Derived, DiagIndex_>> MatrixBase<Derived>::diagonalView()
const {
typedef Diagonal<const Derived, DiagIndex_> DiagType;
typedef DiagonalWrapper<DiagType> ReturnType;
DiagType diag(this->derived());
return ReturnType(diag);
}
/** This is the non-const version of diagonalView() with dynamic index. */
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr DiagonalWrapper<Diagonal<Derived, DynamicIndex>> MatrixBase<Derived>::diagonalView(
Index index) {
typedef Diagonal<Derived, DynamicIndex> DiagType;
typedef DiagonalWrapper<DiagType> ReturnType;
DiagType diag(this->derived(), index);
return ReturnType(diag);
}
/** This is the const version of diagonalView() with dynamic index. */
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr DiagonalWrapper<Diagonal<const Derived, DynamicIndex>> MatrixBase<Derived>::diagonalView(
Index index) const {
typedef Diagonal<const Derived, DynamicIndex> DiagType;
typedef DiagonalWrapper<DiagType> ReturnType;
DiagType diag(this->derived(), index);
return ReturnType(diag);
}
namespace internal {
template<> struct storage_kind_to_shape<DiagonalShape> { typedef DiagonalShape Shape; };
template <>
struct storage_kind_to_shape<DiagonalShape> {
typedef DiagonalShape Shape;
};
struct Diagonal2Dense {};
template<> struct AssignmentKind<DenseShape,DiagonalShape> { typedef Diagonal2Dense Kind; };
template <>
struct AssignmentKind<DenseShape, DiagonalShape> {
typedef Diagonal2Dense Kind;
};
// Diagonal matrix to Dense assignment
template< typename DstXprType, typename SrcXprType, typename Functor>
struct Assignment<DstXprType, SrcXprType, Functor, Diagonal2Dense>
{
static void run(DstXprType &dst, const SrcXprType &src, const internal::assign_op<typename DstXprType::Scalar,typename SrcXprType::Scalar> &/*func*/)
{
template <typename DstXprType, typename SrcXprType, typename Functor>
struct Assignment<DstXprType, SrcXprType, Functor, Diagonal2Dense> {
static EIGEN_DEVICE_FUNC void run(
DstXprType& dst, const SrcXprType& src,
const internal::assign_op<typename DstXprType::Scalar, typename SrcXprType::Scalar>& /*func*/) {
Index dstRows = src.rows();
Index dstCols = src.cols();
if((dst.rows()!=dstRows) || (dst.cols()!=dstCols))
dst.resize(dstRows, dstCols);
if ((dst.rows() != dstRows) || (dst.cols() != dstCols)) dst.resize(dstRows, dstCols);
dst.setZero();
dst.diagonal() = src.diagonal();
}
static void run(DstXprType &dst, const SrcXprType &src, const internal::add_assign_op<typename DstXprType::Scalar,typename SrcXprType::Scalar> &/*func*/)
{ dst.diagonal() += src.diagonal(); }
static void run(DstXprType &dst, const SrcXprType &src, const internal::sub_assign_op<typename DstXprType::Scalar,typename SrcXprType::Scalar> &/*func*/)
{ dst.diagonal() -= src.diagonal(); }
static EIGEN_DEVICE_FUNC void run(
DstXprType& dst, const SrcXprType& src,
const internal::add_assign_op<typename DstXprType::Scalar, typename SrcXprType::Scalar>& /*func*/) {
dst.diagonal() += src.diagonal();
}
static EIGEN_DEVICE_FUNC void run(
DstXprType& dst, const SrcXprType& src,
const internal::sub_assign_op<typename DstXprType::Scalar, typename SrcXprType::Scalar>& /*func*/) {
dst.diagonal() -= src.diagonal();
}
};
} // namespace internal
} // namespace internal
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_DIAGONALMATRIX_H
#endif // EIGEN_DIAGONALMATRIX_H

View File

@@ -11,18 +11,20 @@
#ifndef EIGEN_DIAGONALPRODUCT_H
#define EIGEN_DIAGONALPRODUCT_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \returns the diagonal matrix product of \c *this by the diagonal matrix \a diagonal.
*/
template<typename Derived>
template<typename DiagonalDerived>
inline const Product<Derived, DiagonalDerived, LazyProduct>
MatrixBase<Derived>::operator*(const DiagonalBase<DiagonalDerived> &a_diagonal) const
{
return Product<Derived, DiagonalDerived, LazyProduct>(derived(),a_diagonal.derived());
*/
template <typename Derived>
template <typename DiagonalDerived>
EIGEN_DEVICE_FUNC inline const Product<Derived, DiagonalDerived, LazyProduct> MatrixBase<Derived>::operator*(
const DiagonalBase<DiagonalDerived> &a_diagonal) const {
return Product<Derived, DiagonalDerived, LazyProduct>(derived(), a_diagonal.derived());
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_DIAGONALPRODUCT_H
#endif // EIGEN_DIAGONALPRODUCT_H

View File

@@ -10,303 +10,258 @@
#ifndef EIGEN_DOT_H
#define EIGEN_DOT_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
// helper function for dot(). The problem is that if we put that in the body of dot(), then upon calling dot
// with mismatched types, the compiler emits errors about failing to instantiate cwiseProduct BEFORE
// looking at the static assertions. Thus this is a trick to get better compile errors.
template<typename T, typename U,
// the NeedToTranspose condition here is taken straight from Assign.h
bool NeedToTranspose = T::IsVectorAtCompileTime
&& U::IsVectorAtCompileTime
&& ((int(T::RowsAtCompileTime) == 1 && int(U::ColsAtCompileTime) == 1)
| // FIXME | instead of || to please GCC 4.4.0 stupid warning "suggest parentheses around &&".
// revert to || as soon as not needed anymore.
(int(T::ColsAtCompileTime) == 1 && int(U::RowsAtCompileTime) == 1))
>
struct dot_nocheck
{
typedef scalar_conj_product_op<typename traits<T>::Scalar,typename traits<U>::Scalar> conj_prod;
typedef typename conj_prod::result_type ResScalar;
EIGEN_DEVICE_FUNC
static inline ResScalar run(const MatrixBase<T>& a, const MatrixBase<U>& b)
{
return a.template binaryExpr<conj_prod>(b).sum();
template <typename Derived, typename Scalar = typename traits<Derived>::Scalar>
struct squared_norm_impl {
using Real = typename NumTraits<Scalar>::Real;
static EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE Real run(const Derived& a) {
return a.realView().cwiseAbs2().sum();
}
};
template<typename T, typename U>
struct dot_nocheck<T, U, true>
{
typedef scalar_conj_product_op<typename traits<T>::Scalar,typename traits<U>::Scalar> conj_prod;
typedef typename conj_prod::result_type ResScalar;
EIGEN_DEVICE_FUNC
static inline ResScalar run(const MatrixBase<T>& a, const MatrixBase<U>& b)
{
return a.transpose().template binaryExpr<conj_prod>(b).sum();
}
template <typename Derived>
struct squared_norm_impl<Derived, bool> {
static EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE bool run(const Derived& a) { return a.any(); }
};
} // end namespace internal
} // end namespace internal
/** \returns the dot product of *this with other.
*
* \only_for_vectors
*
* \note If the scalar type is complex numbers, then this function returns the hermitian
* (sesquilinear) dot product, conjugate-linear in the first variable and linear in the
* second variable.
*
* \sa squaredNorm(), norm()
*/
template<typename Derived>
template<typename OtherDerived>
EIGEN_DEVICE_FUNC
typename ScalarBinaryOpTraits<typename internal::traits<Derived>::Scalar,typename internal::traits<OtherDerived>::Scalar>::ReturnType
MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
{
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(Derived,OtherDerived)
typedef internal::scalar_conj_product_op<Scalar,typename OtherDerived::Scalar> func;
EIGEN_CHECK_BINARY_COMPATIBILIY(func,Scalar,typename OtherDerived::Scalar);
eigen_assert(size() == other.size());
return internal::dot_nocheck<Derived,OtherDerived>::run(*this, other);
/** \fn MatrixBase::dot
* \returns the dot product of *this with other.
*
* \only_for_vectors
*
* \note If the scalar type is complex numbers, then this function returns the hermitian
* (sesquilinear) dot product, conjugate-linear in the first variable and linear in the
* second variable.
*
* \sa squaredNorm(), norm()
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE
typename ScalarBinaryOpTraits<typename internal::traits<Derived>::Scalar,
typename internal::traits<OtherDerived>::Scalar>::ReturnType
MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const {
return internal::dot_impl<Derived, OtherDerived>::run(derived(), other.derived());
}
//---------- implementation of L2 norm and related functions ----------
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the Frobenius norm.
* In both cases, it consists in the sum of the square of all the matrix entries.
* For vectors, this is also equals to the dot product of \c *this with itself.
*
* \sa dot(), norm(), lpNorm()
*/
template<typename Derived>
EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::squaredNorm() const
{
return numext::real((*this).cwiseAbs2().sum());
/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the squared Frobenius norm.
* In both cases, it consists in the sum of the square of all the matrix entries.
* For vectors, this is also equal to the dot product of \c *this with itself.
*
* \sa dot(), norm(), lpNorm()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
MatrixBase<Derived>::squaredNorm() const {
return internal::squared_norm_impl<Derived>::run(derived());
}
/** \returns, for vectors, the \em l2 norm of \c *this, and for matrices the Frobenius norm.
* In both cases, it consists in the square root of the sum of the square of all the matrix entries.
* For vectors, this is also equals to the square root of the dot product of \c *this with itself.
*
* \sa lpNorm(), dot(), squaredNorm()
*/
template<typename Derived>
inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::norm() const
{
* In both cases, it consists in the square root of the sum of the square of all the matrix entries.
* For vectors, this is also equal to the square root of the dot product of \c *this with itself.
*
* \sa lpNorm(), dot(), squaredNorm()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
MatrixBase<Derived>::norm() const {
return numext::sqrt(squaredNorm());
}
/** \returns an expression of the quotient of \c *this by its own norm.
*
* \warning If the input vector is too small (i.e., this->norm()==0),
* then this function returns a copy of the input.
*
* \only_for_vectors
*
* \sa norm(), normalize()
*/
template<typename Derived>
inline const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::normalized() const
{
typedef typename internal::nested_eval<Derived,2>::type _Nested;
_Nested n(derived());
*
* \warning If the input vector is too small (i.e., this->norm()==0),
* then this function returns a copy of the input.
*
* \only_for_vectors
*
* \sa norm(), normalize()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject MatrixBase<Derived>::normalized()
const {
typedef typename internal::nested_eval<Derived, 2>::type Nested_;
Nested_ n(derived());
RealScalar z = n.squaredNorm();
// NOTE: after extensive benchmarking, this conditional does not impact performance, at least on recent x86 CPU
if(z>RealScalar(0))
if (z > RealScalar(0))
return n / numext::sqrt(z);
else
return n;
}
/** Normalizes the vector, i.e. divides it by its own norm.
*
* \only_for_vectors
*
* \warning If the input vector is too small (i.e., this->norm()==0), then \c *this is left unchanged.
*
* \sa norm(), normalized()
*/
template<typename Derived>
inline void MatrixBase<Derived>::normalize()
{
*
* \only_for_vectors
*
* \warning If the input vector is too small (i.e., this->norm()==0), then \c *this is left unchanged.
*
* \sa norm(), normalized()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void MatrixBase<Derived>::normalize() {
RealScalar z = squaredNorm();
// NOTE: after extensive benchmarking, this conditional does not impact performance, at least on recent x86 CPU
if(z>RealScalar(0))
derived() /= numext::sqrt(z);
if (z > RealScalar(0)) derived() /= numext::sqrt(z);
}
/** \returns an expression of the quotient of \c *this by its own norm while avoiding underflow and overflow.
*
* \only_for_vectors
*
* This method is analogue to the normalized() method, but it reduces the risk of
* underflow and overflow when computing the norm.
*
* \warning If the input vector is too small (i.e., this->norm()==0),
* then this function returns a copy of the input.
*
* \sa stableNorm(), stableNormalize(), normalized()
*/
template<typename Derived>
inline const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::stableNormalized() const
{
typedef typename internal::nested_eval<Derived,3>::type _Nested;
_Nested n(derived());
*
* \only_for_vectors
*
* This method is analogue to the normalized() method, but it reduces the risk of
* underflow and overflow when computing the norm.
*
* \warning If the input vector is too small (i.e., this->norm()==0),
* then this function returns a copy of the input.
*
* \sa stableNorm(), stableNormalize(), normalized()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::stableNormalized() const {
typedef typename internal::nested_eval<Derived, 3>::type Nested_;
Nested_ n(derived());
RealScalar w = n.cwiseAbs().maxCoeff();
RealScalar z = (n/w).squaredNorm();
if(z>RealScalar(0))
return n / (numext::sqrt(z)*w);
RealScalar z = (n / w).squaredNorm();
if (z > RealScalar(0))
return n / (numext::sqrt(z) * w);
else
return n;
}
/** Normalizes the vector while avoid underflow and overflow
*
* \only_for_vectors
*
* This method is analogue to the normalize() method, but it reduces the risk of
* underflow and overflow when computing the norm.
*
* \warning If the input vector is too small (i.e., this->norm()==0), then \c *this is left unchanged.
*
* \sa stableNorm(), stableNormalized(), normalize()
*/
template<typename Derived>
inline void MatrixBase<Derived>::stableNormalize()
{
*
* \only_for_vectors
*
* This method is analogue to the normalize() method, but it reduces the risk of
* underflow and overflow when computing the norm.
*
* \warning If the input vector is too small (i.e., this->norm()==0), then \c *this is left unchanged.
*
* \sa stableNorm(), stableNormalized(), normalize()
*/
template <typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void MatrixBase<Derived>::stableNormalize() {
RealScalar w = cwiseAbs().maxCoeff();
RealScalar z = (derived()/w).squaredNorm();
if(z>RealScalar(0))
derived() /= numext::sqrt(z)*w;
RealScalar z = (derived() / w).squaredNorm();
if (z > RealScalar(0)) derived() /= numext::sqrt(z) * w;
}
//---------- implementation of other norms ----------
namespace internal {
template<typename Derived, int p>
struct lpNorm_selector
{
template <typename Derived, int p>
struct lpNorm_selector {
typedef typename NumTraits<typename traits<Derived>::Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const MatrixBase<Derived>& m)
{
EIGEN_USING_STD_MATH(pow)
return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1)/p);
EIGEN_DEVICE_FUNC static inline RealScalar run(const MatrixBase<Derived>& m) {
EIGEN_USING_STD(pow)
return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1) / p);
}
};
template<typename Derived>
struct lpNorm_selector<Derived, 1>
{
EIGEN_DEVICE_FUNC
static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(const MatrixBase<Derived>& m)
{
template <typename Derived>
struct lpNorm_selector<Derived, 1> {
EIGEN_DEVICE_FUNC static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(
const MatrixBase<Derived>& m) {
return m.cwiseAbs().sum();
}
};
template<typename Derived>
struct lpNorm_selector<Derived, 2>
{
EIGEN_DEVICE_FUNC
static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(const MatrixBase<Derived>& m)
{
template <typename Derived>
struct lpNorm_selector<Derived, 2> {
EIGEN_DEVICE_FUNC static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(
const MatrixBase<Derived>& m) {
return m.norm();
}
};
template<typename Derived>
struct lpNorm_selector<Derived, Infinity>
{
template <typename Derived>
struct lpNorm_selector<Derived, Infinity> {
typedef typename NumTraits<typename traits<Derived>::Scalar>::Real RealScalar;
EIGEN_DEVICE_FUNC
static inline RealScalar run(const MatrixBase<Derived>& m)
{
if(Derived::SizeAtCompileTime==0 || (Derived::SizeAtCompileTime==Dynamic && m.size()==0))
EIGEN_DEVICE_FUNC static inline RealScalar run(const MatrixBase<Derived>& m) {
if (Derived::SizeAtCompileTime == 0 || (Derived::SizeAtCompileTime == Dynamic && m.size() == 0))
return RealScalar(0);
return m.cwiseAbs().maxCoeff();
}
};
} // end namespace internal
} // end namespace internal
/** \returns the \b coefficient-wise \f$ \ell^p \f$ norm of \c *this, that is, returns the p-th root of the sum of the p-th powers of the absolute values
* of the coefficients of \c *this. If \a p is the special value \a Eigen::Infinity, this function returns the \f$ \ell^\infty \f$
* norm, that is the maximum of the absolute values of the coefficients of \c *this.
*
* In all cases, if \c *this is empty, then the value 0 is returned.
*
* \note For matrices, this function does not compute the <a href="https://en.wikipedia.org/wiki/Operator_norm">operator-norm</a>. That is, if \c *this is a matrix, then its coefficients are interpreted as a 1D vector. Nonetheless, you can easily compute the 1-norm and \f$\infty\f$-norm matrix operator norms using \link TutorialReductionsVisitorsBroadcastingReductionsNorm partial reductions \endlink.
*
* \sa norm()
*/
template<typename Derived>
template<int p>
/** \returns the \b coefficient-wise \f$ \ell^p \f$ norm of \c *this, that is, returns the p-th root of the sum of the
* p-th powers of the absolute values of the coefficients of \c *this. If \a p is the special value \a Eigen::Infinity,
* this function returns the \f$ \ell^\infty \f$ norm, that is the maximum of the absolute values of the coefficients of
* \c *this.
*
* In all cases, if \c *this is empty, then the value 0 is returned.
*
* \note For matrices, this function does not compute the <a
* href="https://en.wikipedia.org/wiki/Operator_norm">operator-norm</a>. That is, if \c *this is a matrix, then its
* coefficients are interpreted as a 1D vector. Nonetheless, you can easily compute the 1-norm and \f$\infty\f$-norm
* matrix operator norms using \link TutorialReductionsVisitorsBroadcastingReductionsNorm partial reductions \endlink.
*
* \sa norm()
*/
template <typename Derived>
template <int p>
#ifndef EIGEN_PARSED_BY_DOXYGEN
inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
EIGEN_DEVICE_FUNC inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
#else
MatrixBase<Derived>::RealScalar
EIGEN_DEVICE_FUNC MatrixBase<Derived>::RealScalar
#endif
MatrixBase<Derived>::lpNorm() const
{
MatrixBase<Derived>::lpNorm() const {
return internal::lpNorm_selector<Derived, p>::run(*this);
}
//---------- implementation of isOrthogonal / isUnitary ----------
/** \returns true if *this is approximately orthogonal to \a other,
* within the precision given by \a prec.
*
* Example: \include MatrixBase_isOrthogonal.cpp
* Output: \verbinclude MatrixBase_isOrthogonal.out
*/
template<typename Derived>
template<typename OtherDerived>
bool MatrixBase<Derived>::isOrthogonal
(const MatrixBase<OtherDerived>& other, const RealScalar& prec) const
{
typename internal::nested_eval<Derived,2>::type nested(derived());
typename internal::nested_eval<OtherDerived,2>::type otherNested(other.derived());
* within the precision given by \a prec.
*
* Example: \include MatrixBase_isOrthogonal.cpp
* Output: \verbinclude MatrixBase_isOrthogonal.out
*/
template <typename Derived>
template <typename OtherDerived>
bool MatrixBase<Derived>::isOrthogonal(const MatrixBase<OtherDerived>& other, const RealScalar& prec) const {
typename internal::nested_eval<Derived, 2>::type nested(derived());
typename internal::nested_eval<OtherDerived, 2>::type otherNested(other.derived());
return numext::abs2(nested.dot(otherNested)) <= prec * prec * nested.squaredNorm() * otherNested.squaredNorm();
}
/** \returns true if *this is approximately an unitary matrix,
* within the precision given by \a prec. In the case where the \a Scalar
* type is real numbers, a unitary matrix is an orthogonal matrix, whence the name.
*
* \note This can be used to check whether a family of vectors forms an orthonormal basis.
* Indeed, \c m.isUnitary() returns true if and only if the columns (equivalently, the rows) of m form an
* orthonormal basis.
*
* Example: \include MatrixBase_isUnitary.cpp
* Output: \verbinclude MatrixBase_isUnitary.out
*/
template<typename Derived>
bool MatrixBase<Derived>::isUnitary(const RealScalar& prec) const
{
typename internal::nested_eval<Derived,1>::type self(derived());
for(Index i = 0; i < cols(); ++i)
{
if(!internal::isApprox(self.col(i).squaredNorm(), static_cast<RealScalar>(1), prec))
return false;
for(Index j = 0; j < i; ++j)
if(!internal::isMuchSmallerThan(self.col(i).dot(self.col(j)), static_cast<Scalar>(1), prec))
return false;
* within the precision given by \a prec. In the case where the \a Scalar
* type is real numbers, a unitary matrix is an orthogonal matrix, whence the name.
*
* \note This can be used to check whether a family of vectors forms an orthonormal basis.
* Indeed, \c m.isUnitary() returns true if and only if the columns (equivalently, the rows) of m form an
* orthonormal basis.
*
* Example: \include MatrixBase_isUnitary.cpp
* Output: \verbinclude MatrixBase_isUnitary.out
*/
template <typename Derived>
bool MatrixBase<Derived>::isUnitary(const RealScalar& prec) const {
typename internal::nested_eval<Derived, 1>::type self(derived());
for (Index i = 0; i < cols(); ++i) {
if (!internal::isApprox(self.col(i).squaredNorm(), static_cast<RealScalar>(1), prec)) return false;
for (Index j = 0; j < i; ++j)
if (!internal::isMuchSmallerThan(self.col(i).dot(self.col(j)), static_cast<Scalar>(1), prec)) return false;
}
return true;
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_DOT_H
#endif // EIGEN_DOT_H

View File

@@ -11,145 +11,139 @@
#ifndef EIGEN_EIGENBASE_H
#define EIGEN_EIGENBASE_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \class EigenBase
*
* Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T).
*
* In other words, an EigenBase object is an object that can be copied into a MatrixBase.
*
* Besides MatrixBase-derived classes, this also includes special matrix classes such as diagonal matrices, etc.
*
* Notice that this class is trivial, it is only used to disambiguate overloaded functions.
*
* \sa \blank \ref TopicClassHierarchy
*/
template<typename Derived> struct EigenBase
{
// typedef typename internal::plain_matrix_type<Derived>::type PlainObject;
* \ingroup Core_Module
*
* Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T).
*
* In other words, an EigenBase object is an object that can be copied into a MatrixBase.
*
* Besides MatrixBase-derived classes, this also includes special matrix classes such as diagonal matrices, etc.
*
* Notice that this class is trivial, it is only used to disambiguate overloaded functions.
*
* \sa \blank \ref TopicClassHierarchy
*/
template <typename Derived>
struct EigenBase {
// typedef typename internal::plain_matrix_type<Derived>::type PlainObject;
/** \brief The interface type of indices
* \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
* \deprecated Since Eigen 3.3, its usage is deprecated. Use Eigen::Index instead.
* \sa StorageIndex, \ref TopicPreprocessorDirectives.
*/
* \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
* \sa StorageIndex, \ref TopicPreprocessorDirectives.
* DEPRECATED: Since Eigen 3.3, its usage is deprecated. Use Eigen::Index instead.
* Deprecation is not marked with a doxygen comment because there are too many existing usages to add the deprecation
* attribute.
*/
typedef Eigen::Index Index;
// FIXME is it needed?
typedef typename internal::traits<Derived>::StorageKind StorageKind;
/** \returns a reference to the derived object */
EIGEN_DEVICE_FUNC
Derived& derived() { return *static_cast<Derived*>(this); }
EIGEN_DEVICE_FUNC constexpr Derived& derived() { return *static_cast<Derived*>(this); }
/** \returns a const reference to the derived object */
EIGEN_DEVICE_FUNC
const Derived& derived() const { return *static_cast<const Derived*>(this); }
EIGEN_DEVICE_FUNC constexpr const Derived& derived() const { return *static_cast<const Derived*>(this); }
EIGEN_DEVICE_FUNC
inline Derived& const_cast_derived() const
{ return *static_cast<Derived*>(const_cast<EigenBase*>(this)); }
EIGEN_DEVICE_FUNC
inline const Derived& const_derived() const
{ return *static_cast<const Derived*>(this); }
EIGEN_DEVICE_FUNC inline constexpr Derived& const_cast_derived() const {
return *static_cast<Derived*>(const_cast<EigenBase*>(this));
}
EIGEN_DEVICE_FUNC constexpr inline const Derived& const_derived() const { return *static_cast<const Derived*>(this); }
/** \returns the number of rows. \sa cols(), RowsAtCompileTime */
EIGEN_DEVICE_FUNC
inline Index rows() const { return derived().rows(); }
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return derived().rows(); }
/** \returns the number of columns. \sa rows(), ColsAtCompileTime*/
EIGEN_DEVICE_FUNC
inline Index cols() const { return derived().cols(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return derived().cols(); }
/** \returns the number of coefficients, which is rows()*cols().
* \sa rows(), cols(), SizeAtCompileTime. */
EIGEN_DEVICE_FUNC
inline Index size() const { return rows() * cols(); }
* \sa rows(), cols(), SizeAtCompileTime. */
EIGEN_DEVICE_FUNC constexpr Index size() const noexcept { return rows() * cols(); }
/** \internal Don't use it, but do the equivalent: \code dst = *this; \endcode */
template<typename Dest>
EIGEN_DEVICE_FUNC
inline void evalTo(Dest& dst) const
{ derived().evalTo(dst); }
template <typename Dest>
EIGEN_DEVICE_FUNC constexpr inline void evalTo(Dest& dst) const {
derived().evalTo(dst);
}
/** \internal Don't use it, but do the equivalent: \code dst += *this; \endcode */
template<typename Dest>
EIGEN_DEVICE_FUNC
inline void addTo(Dest& dst) const
{
template <typename Dest>
EIGEN_DEVICE_FUNC constexpr inline void addTo(Dest& dst) const {
// This is the default implementation,
// derived class can reimplement it in a more optimized way.
typename Dest::PlainObject res(rows(),cols());
typename Dest::PlainObject res(rows(), cols());
evalTo(res);
dst += res;
}
/** \internal Don't use it, but do the equivalent: \code dst -= *this; \endcode */
template<typename Dest>
EIGEN_DEVICE_FUNC
inline void subTo(Dest& dst) const
{
template <typename Dest>
EIGEN_DEVICE_FUNC constexpr inline void subTo(Dest& dst) const {
// This is the default implementation,
// derived class can reimplement it in a more optimized way.
typename Dest::PlainObject res(rows(),cols());
typename Dest::PlainObject res(rows(), cols());
evalTo(res);
dst -= res;
}
/** \internal Don't use it, but do the equivalent: \code dst.applyOnTheRight(*this); \endcode */
template<typename Dest>
EIGEN_DEVICE_FUNC inline void applyThisOnTheRight(Dest& dst) const
{
template <typename Dest>
EIGEN_DEVICE_FUNC constexpr inline void applyThisOnTheRight(Dest& dst) const {
// This is the default implementation,
// derived class can reimplement it in a more optimized way.
dst = dst * this->derived();
}
/** \internal Don't use it, but do the equivalent: \code dst.applyOnTheLeft(*this); \endcode */
template<typename Dest>
EIGEN_DEVICE_FUNC inline void applyThisOnTheLeft(Dest& dst) const
{
template <typename Dest>
EIGEN_DEVICE_FUNC constexpr inline void applyThisOnTheLeft(Dest& dst) const {
// This is the default implementation,
// derived class can reimplement it in a more optimized way.
dst = this->derived() * dst;
}
template <typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DeviceWrapper<Derived, Device> device(Device& device);
template <typename Device>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DeviceWrapper<const Derived, Device> device(Device& device) const;
};
/***************************************************************************
* Implementation of matrix base methods
***************************************************************************/
* Implementation of matrix base methods
***************************************************************************/
/** \brief Copies the generic expression \a other into *this.
*
* \details The expression must provide a (templated) evalTo(Derived& dst) const
* function which does the actual job. In practice, this allows any user to write
* its own special matrix without having to modify MatrixBase
*
* \returns a reference to *this.
*/
template<typename Derived>
template<typename OtherDerived>
Derived& DenseBase<Derived>::operator=(const EigenBase<OtherDerived> &other)
{
*
* \details The expression must provide a (templated) evalTo(Derived& dst) const
* function which does the actual job. In practice, this allows any user to write
* its own special matrix without having to modify MatrixBase
*
* \returns a reference to *this.
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr Derived& DenseBase<Derived>::operator=(const EigenBase<OtherDerived>& other) {
call_assignment(derived(), other.derived());
return derived();
}
template<typename Derived>
template<typename OtherDerived>
Derived& DenseBase<Derived>::operator+=(const EigenBase<OtherDerived> &other)
{
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar,typename OtherDerived::Scalar>());
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr Derived& DenseBase<Derived>::operator+=(const EigenBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::add_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
template<typename Derived>
template<typename OtherDerived>
Derived& DenseBase<Derived>::operator-=(const EigenBase<OtherDerived> &other)
{
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar,typename OtherDerived::Scalar>());
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr Derived& DenseBase<Derived>::operator-=(const EigenBase<OtherDerived>& other) {
call_assignment(derived(), other.derived(), internal::sub_assign_op<Scalar, typename OtherDerived::Scalar>());
return derived();
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_EIGENBASE_H
#endif // EIGEN_EIGENBASE_H

143
Eigen/src/Core/Fill.h Normal file
View File

@@ -0,0 +1,143 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2024 Charles Schlosser <cs.schlosser@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_FILL_H
#define EIGEN_FILL_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template <typename Xpr>
struct eigen_fill_helper : std::false_type {};
// Only enable std::fill_n for trivially copyable scalars. GCC's libstdc++
// fill_n pessimizes non-trivially-copyable types (extra moves per iteration),
// causing measurable regressions for types like AutoDiffScalar (issue #2956).
template <typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
struct eigen_fill_helper<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols>> : std::is_trivially_copyable<Scalar> {};
template <typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
struct eigen_fill_helper<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols>> : std::is_trivially_copyable<Scalar> {};
template <typename Xpr, int BlockRows, int BlockCols>
struct eigen_fill_helper<Block<Xpr, BlockRows, BlockCols, /*InnerPanel*/ true>> : eigen_fill_helper<Xpr> {};
template <typename Xpr, int BlockRows, int BlockCols>
struct eigen_fill_helper<Block<Xpr, BlockRows, BlockCols, /*InnerPanel*/ false>>
: std::integral_constant<bool, eigen_fill_helper<Xpr>::value &&
(Xpr::IsRowMajor ? (BlockRows == 1) : (BlockCols == 1))> {};
template <typename Xpr, int Options>
struct eigen_fill_helper<Map<Xpr, Options, Stride<0, 0>>> : eigen_fill_helper<Xpr> {};
template <typename Xpr, int Options, int OuterStride_>
struct eigen_fill_helper<Map<Xpr, Options, Stride<OuterStride_, 0>>>
: std::integral_constant<bool, eigen_fill_helper<Xpr>::value &&
enum_eq_not_dynamic(OuterStride_, Xpr::InnerSizeAtCompileTime)> {};
template <typename Xpr, int Options, int OuterStride_>
struct eigen_fill_helper<Map<Xpr, Options, Stride<OuterStride_, 1>>>
: eigen_fill_helper<Map<Xpr, Options, Stride<OuterStride_, 0>>> {};
template <typename Xpr, int Options, int InnerStride_>
struct eigen_fill_helper<Map<Xpr, Options, InnerStride<InnerStride_>>>
: eigen_fill_helper<Map<Xpr, Options, Stride<0, InnerStride_>>> {};
template <typename Xpr, int Options, int OuterStride_>
struct eigen_fill_helper<Map<Xpr, Options, OuterStride<OuterStride_>>>
: eigen_fill_helper<Map<Xpr, Options, Stride<OuterStride_, 0>>> {};
template <typename Xpr>
struct eigen_fill_impl<Xpr, /*use_fill*/ false> {
using Scalar = typename Xpr::Scalar;
using Func = scalar_constant_op<Scalar>;
using PlainObject = typename Xpr::PlainObject;
using Constant = typename PlainObject::ConstantReturnType;
static EIGEN_DEVICE_FUNC constexpr void run(Xpr& dst, const Scalar& val) {
const Constant src(dst.rows(), dst.cols(), val);
run(dst, src);
}
template <typename SrcXpr>
static EIGEN_DEVICE_FUNC constexpr void run(Xpr& dst, const SrcXpr& src) {
call_dense_assignment_loop(dst, src, assign_op<Scalar, Scalar>());
}
};
#if EIGEN_COMP_MSVC || defined(EIGEN_GPU_COMPILE_PHASE)
template <typename Xpr>
struct eigen_fill_impl<Xpr, /*use_fill*/ true> : eigen_fill_impl<Xpr, /*use_fill*/ false> {};
#else
template <typename Xpr>
struct eigen_fill_impl<Xpr, /*use_fill*/ true> {
using Scalar = typename Xpr::Scalar;
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Xpr& dst, const Scalar& val) {
const Scalar val_copy = val;
using std::fill_n;
fill_n(dst.data(), dst.size(), val_copy);
}
template <typename SrcXpr>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Xpr& dst, const SrcXpr& src) {
resize_if_allowed(dst, src, assign_op<Scalar, Scalar>());
const Scalar& val = src.functor()();
run(dst, val);
}
};
#endif
template <typename Xpr>
struct eigen_memset_helper {
using Scalar = typename Xpr::Scalar;
static constexpr bool value = std::is_trivially_copyable<Scalar>::value &&
!static_cast<bool>(NumTraits<Scalar>::RequireInitialization) &&
eigen_fill_helper<Xpr>::value;
};
template <typename Xpr>
struct eigen_zero_impl<Xpr, /*use_memset*/ false> {
using Scalar = typename Xpr::Scalar;
using PlainObject = typename Xpr::PlainObject;
using Zero = typename PlainObject::ZeroReturnType;
static EIGEN_DEVICE_FUNC constexpr void run(Xpr& dst) {
const Zero src(dst.rows(), dst.cols());
run(dst, src);
}
template <typename SrcXpr>
static EIGEN_DEVICE_FUNC constexpr void run(Xpr& dst, const SrcXpr& src) {
call_dense_assignment_loop(dst, src, assign_op<Scalar, Scalar>());
}
};
template <typename Xpr>
struct eigen_zero_impl<Xpr, /*use_memset*/ true> {
using Scalar = typename Xpr::Scalar;
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Xpr& dst) {
const std::ptrdiff_t num_bytes = dst.size() * static_cast<std::ptrdiff_t>(sizeof(Scalar));
if (num_bytes <= 0) return;
void* dst_ptr = static_cast<void*>(dst.data());
#ifndef EIGEN_NO_DEBUG
eigen_assert((dst_ptr != nullptr) && "null pointer dereference error!");
#endif
EIGEN_USING_STD(memset);
memset(dst_ptr, 0, static_cast<std::size_t>(num_bytes));
}
template <typename SrcXpr>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(Xpr& dst, const SrcXpr& src) {
resize_if_allowed(dst, src, assign_op<Scalar, Scalar>());
run(dst);
}
};
} // namespace internal
} // namespace Eigen
#endif // EIGEN_FILL_H

464
Eigen/src/Core/FindCoeff.h Normal file
View File

@@ -0,0 +1,464 @@
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// Copyright (C) 2025 Charlie Schlosser <cs.schlosser@gmail.com>
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_FIND_COEFF_H
#define EIGEN_FIND_COEFF_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
namespace internal {
template <typename Scalar, int NaNPropagation, bool IsInteger = NumTraits<Scalar>::IsInteger>
struct max_coeff_functor {
EIGEN_DEVICE_FUNC inline bool compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return candidate > incumbent;
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pcmp_lt(incumbent, candidate);
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_max(a);
}
};
template <typename Scalar>
struct max_coeff_functor<Scalar, PropagateNaN, false> {
EIGEN_DEVICE_FUNC inline Scalar compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return (candidate > incumbent) || ((candidate != candidate) && (incumbent == incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pandnot(pcmp_lt_or_nan(incumbent, candidate), pisnan(incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_max<PropagateNaN>(a);
}
};
template <typename Scalar>
struct max_coeff_functor<Scalar, PropagateNumbers, false> {
EIGEN_DEVICE_FUNC inline bool compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return (candidate > incumbent) || ((candidate == candidate) && (incumbent != incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pandnot(pcmp_lt_or_nan(incumbent, candidate), pisnan(candidate));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_max<PropagateNumbers>(a);
}
};
template <typename Scalar, int NaNPropagation, bool IsInteger = NumTraits<Scalar>::IsInteger>
struct min_coeff_functor {
EIGEN_DEVICE_FUNC inline bool compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return candidate < incumbent;
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pcmp_lt(candidate, incumbent);
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_min(a);
}
};
template <typename Scalar>
struct min_coeff_functor<Scalar, PropagateNaN, false> {
EIGEN_DEVICE_FUNC inline Scalar compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return (candidate < incumbent) || ((candidate != candidate) && (incumbent == incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pandnot(pcmp_lt_or_nan(candidate, incumbent), pisnan(incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_min<PropagateNaN>(a);
}
};
template <typename Scalar>
struct min_coeff_functor<Scalar, PropagateNumbers, false> {
EIGEN_DEVICE_FUNC inline bool compareCoeff(const Scalar& incumbent, const Scalar& candidate) const {
return (candidate < incumbent) || ((candidate == candidate) && (incumbent != incumbent));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Packet comparePacket(const Packet& incumbent, const Packet& candidate) const {
return pandnot(pcmp_lt_or_nan(candidate, incumbent), pisnan(candidate));
}
template <typename Packet>
EIGEN_DEVICE_FUNC inline Scalar predux(const Packet& a) const {
return predux_min<PropagateNumbers>(a);
}
};
template <typename Scalar>
struct min_max_traits {
static constexpr bool PacketAccess = packet_traits<Scalar>::Vectorizable;
};
template <typename Scalar, int NaNPropagation>
struct functor_traits<max_coeff_functor<Scalar, NaNPropagation>> : min_max_traits<Scalar> {};
template <typename Scalar, int NaNPropagation>
struct functor_traits<min_coeff_functor<Scalar, NaNPropagation>> : min_max_traits<Scalar> {};
template <typename Evaluator, typename Func, bool Linear, bool Vectorize>
struct find_coeff_loop;
template <typename Evaluator, typename Func>
struct find_coeff_loop<Evaluator, Func, /*Linear*/ false, /*Vectorize*/ false> {
using Scalar = typename Evaluator::Scalar;
static EIGEN_DEVICE_FUNC inline void run(const Evaluator& eval, Func& func, Scalar& res, Index& outer, Index& inner) {
Index outerSize = eval.outerSize();
Index innerSize = eval.innerSize();
/* initialization performed in calling function */
/* result = eval.coeff(0, 0); */
/* outer = 0; */
/* inner = 0; */
for (Index j = 0; j < outerSize; j++) {
for (Index i = 0; i < innerSize; i++) {
Scalar xprCoeff = eval.coeffByOuterInner(j, i);
bool newRes = func.compareCoeff(res, xprCoeff);
if (newRes) {
outer = j;
inner = i;
res = xprCoeff;
}
}
}
}
};
template <typename Evaluator, typename Func>
struct find_coeff_loop<Evaluator, Func, /*Linear*/ true, /*Vectorize*/ false> {
using Scalar = typename Evaluator::Scalar;
static EIGEN_DEVICE_FUNC inline void run(const Evaluator& eval, Func& func, Scalar& res, Index& index) {
Index size = eval.size();
/* initialization performed in calling function */
/* result = eval.coeff(0); */
/* index = 0; */
for (Index k = 0; k < size; k++) {
Scalar xprCoeff = eval.coeff(k);
bool newRes = func.compareCoeff(res, xprCoeff);
if (newRes) {
index = k;
res = xprCoeff;
}
}
}
};
template <typename Evaluator, typename Func>
struct find_coeff_loop<Evaluator, Func, /*Linear*/ false, /*Vectorize*/ true> {
using ScalarImpl = find_coeff_loop<Evaluator, Func, false, false>;
using Scalar = typename Evaluator::Scalar;
using Packet = typename Evaluator::Packet;
static constexpr int PacketSize = unpacket_traits<Packet>::size;
static EIGEN_DEVICE_FUNC inline void run(const Evaluator& eval, Func& func, Scalar& result, Index& outer,
Index& inner) {
Index outerSize = eval.outerSize();
Index innerSize = eval.innerSize();
Index packetEnd = numext::round_down(innerSize, PacketSize);
/* initialization performed in calling function */
/* result = eval.coeff(0, 0); */
/* outer = 0; */
/* inner = 0; */
bool checkPacket = false;
for (Index j = 0; j < outerSize; j++) {
Packet resultPacket = pset1<Packet>(result);
for (Index i = 0; i < packetEnd; i += PacketSize) {
Packet xprPacket = eval.template packetByOuterInner<Unaligned, Packet>(j, i);
if (predux_any(func.comparePacket(resultPacket, xprPacket))) {
outer = j;
inner = i;
result = func.predux(xprPacket);
resultPacket = pset1<Packet>(result);
checkPacket = true;
}
}
for (Index i = packetEnd; i < innerSize; i++) {
Scalar xprCoeff = eval.coeffByOuterInner(j, i);
if (func.compareCoeff(result, xprCoeff)) {
outer = j;
inner = i;
result = xprCoeff;
checkPacket = false;
}
}
}
if (checkPacket) {
result = eval.coeffByOuterInner(outer, inner);
Index i_end = inner + PacketSize;
for (Index i = inner; i < i_end; i++) {
Scalar xprCoeff = eval.coeffByOuterInner(outer, i);
if (func.compareCoeff(result, xprCoeff)) {
inner = i;
result = xprCoeff;
}
}
}
}
};
template <typename Evaluator, typename Func>
struct find_coeff_loop<Evaluator, Func, /*Linear*/ true, /*Vectorize*/ true> {
using ScalarImpl = find_coeff_loop<Evaluator, Func, true, false>;
using Scalar = typename Evaluator::Scalar;
using Packet = typename Evaluator::Packet;
static constexpr int PacketSize = unpacket_traits<Packet>::size;
static constexpr int Alignment = Evaluator::Alignment;
static EIGEN_DEVICE_FUNC inline void run(const Evaluator& eval, Func& func, Scalar& result, Index& index) {
Index size = eval.size();
Index packetEnd = numext::round_down(size, PacketSize);
/* initialization performed in calling function */
/* result = eval.coeff(0); */
/* index = 0; */
Packet resultPacket = pset1<Packet>(result);
bool checkPacket = false;
for (Index k = 0; k < packetEnd; k += PacketSize) {
Packet xprPacket = eval.template packet<Alignment, Packet>(k);
if (predux_any(func.comparePacket(resultPacket, xprPacket))) {
index = k;
result = func.predux(xprPacket);
resultPacket = pset1<Packet>(result);
checkPacket = true;
}
}
for (Index k = packetEnd; k < size; k++) {
Scalar xprCoeff = eval.coeff(k);
if (func.compareCoeff(result, xprCoeff)) {
index = k;
result = xprCoeff;
checkPacket = false;
}
}
if (checkPacket) {
result = eval.coeff(index);
Index k_end = index + PacketSize;
for (Index k = index; k < k_end; k++) {
Scalar xprCoeff = eval.coeff(k);
if (func.compareCoeff(result, xprCoeff)) {
index = k;
result = xprCoeff;
}
}
}
}
};
template <typename Derived>
struct find_coeff_evaluator : public evaluator<Derived> {
using Base = evaluator<Derived>;
using Scalar = typename Derived::Scalar;
using Packet = typename packet_traits<Scalar>::type;
static constexpr int Flags = Base::Flags;
static constexpr bool IsRowMajor = bool(Flags & RowMajorBit);
EIGEN_DEVICE_FUNC inline find_coeff_evaluator(const Derived& xpr) : Base(xpr), m_xpr(xpr) {}
EIGEN_DEVICE_FUNC inline Scalar coeffByOuterInner(Index outer, Index inner) const {
Index row = IsRowMajor ? outer : inner;
Index col = IsRowMajor ? inner : outer;
return Base::coeff(row, col);
}
template <int LoadMode, typename PacketType>
EIGEN_DEVICE_FUNC inline PacketType packetByOuterInner(Index outer, Index inner) const {
Index row = IsRowMajor ? outer : inner;
Index col = IsRowMajor ? inner : outer;
return Base::template packet<LoadMode, PacketType>(row, col);
}
EIGEN_DEVICE_FUNC inline Index innerSize() const { return m_xpr.innerSize(); }
EIGEN_DEVICE_FUNC inline Index outerSize() const { return m_xpr.outerSize(); }
EIGEN_DEVICE_FUNC inline Index size() const { return m_xpr.size(); }
const Derived& m_xpr;
};
template <typename Derived, typename Func>
struct find_coeff_impl {
using Evaluator = find_coeff_evaluator<Derived>;
static constexpr int Flags = Evaluator::Flags;
static constexpr int Alignment = Evaluator::Alignment;
static constexpr bool IsRowMajor = Derived::IsRowMajor;
static constexpr int MaxInnerSizeAtCompileTime =
IsRowMajor ? Derived::MaxColsAtCompileTime : Derived::MaxRowsAtCompileTime;
static constexpr int MaxSizeAtCompileTime = Derived::MaxSizeAtCompileTime;
using Scalar = typename Derived::Scalar;
using Packet = typename Evaluator::Packet;
static constexpr int PacketSize = unpacket_traits<Packet>::size;
static constexpr bool Linearize = bool(Flags & LinearAccessBit);
static constexpr bool DontVectorize =
enum_lt_not_dynamic(Linearize ? MaxSizeAtCompileTime : MaxInnerSizeAtCompileTime, PacketSize);
static constexpr bool Vectorize =
!DontVectorize && bool(Flags & PacketAccessBit) && functor_traits<Func>::PacketAccess;
using Loop = find_coeff_loop<Evaluator, Func, Linearize, Vectorize>;
template <bool ForwardLinearAccess = Linearize, std::enable_if_t<!ForwardLinearAccess, bool> = true>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(const Derived& xpr, Func& func, Scalar& res, Index& outer,
Index& inner) {
Evaluator eval(xpr);
Loop::run(eval, func, res, outer, inner);
}
template <bool ForwardLinearAccess = Linearize, std::enable_if_t<ForwardLinearAccess, bool> = true>
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(const Derived& xpr, Func& func, Scalar& res, Index& outer,
Index& inner) {
// where possible, use the linear loop and back-calculate the outer and inner indices
Index index = 0;
run(xpr, func, res, index);
outer = index / xpr.innerSize();
inner = index % xpr.innerSize();
}
static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void run(const Derived& xpr, Func& func, Scalar& res, Index& index) {
Evaluator eval(xpr);
Loop::run(eval, func, res, index);
}
};
template <typename Derived, typename IndexType, typename Func>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar findCoeff(const DenseBase<Derived>& mat, Func& func,
IndexType* rowPtr, IndexType* colPtr) {
eigen_assert(mat.rows() > 0 && mat.cols() > 0 && "you are using an empty matrix");
using Scalar = typename DenseBase<Derived>::Scalar;
using FindCoeffImpl = internal::find_coeff_impl<Derived, Func>;
Index outer = 0;
Index inner = 0;
Scalar res = mat.coeff(0, 0);
FindCoeffImpl::run(mat.derived(), func, res, outer, inner);
*rowPtr = internal::convert_index<IndexType>(Derived::IsRowMajor ? outer : inner);
if (colPtr) *colPtr = internal::convert_index<IndexType>(Derived::IsRowMajor ? inner : outer);
return res;
}
template <typename Derived, typename IndexType, typename Func>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar findCoeff(const DenseBase<Derived>& mat, Func& func,
IndexType* indexPtr) {
eigen_assert(mat.size() > 0 && "you are using an empty matrix");
EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
using Scalar = typename DenseBase<Derived>::Scalar;
using FindCoeffImpl = internal::find_coeff_impl<Derived, Func>;
Index index = 0;
Scalar res = mat.coeff(0);
FindCoeffImpl::run(mat.derived(), func, res, index);
*indexPtr = internal::convert_index<IndexType>(index);
return res;
}
} // namespace internal
/** \fn DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
* \returns the minimum of all coefficients of *this and puts in *row and *col its location.
*
* If there are multiple coefficients with the same extreme value, the location of the first instance is returned.
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::minCoeff(Index*), DenseBase::maxCoeff(Index*,Index*), DenseBase::visit(), DenseBase::minCoeff()
*/
template <typename Derived>
template <int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar DenseBase<Derived>::minCoeff(IndexType* rowPtr,
IndexType* colPtr) const {
using Func = internal::min_coeff_functor<Scalar, NaNPropagation>;
Func func;
return internal::findCoeff(derived(), func, rowPtr, colPtr);
}
/** \returns the minimum of all coefficients of *this and puts in *index its location.
*
* If there are multiple coefficients with the same extreme value, the location of the first instance is returned.
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::visit(),
* DenseBase::minCoeff()
*/
template <typename Derived>
template <int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar DenseBase<Derived>::minCoeff(IndexType* indexPtr) const {
using Func = internal::min_coeff_functor<Scalar, NaNPropagation>;
Func func;
return internal::findCoeff(derived(), func, indexPtr);
}
/** \fn DenseBase<Derived>::maxCoeff(IndexType* rowId, IndexType* colId) const
* \returns the maximum of all coefficients of *this and puts in *row and *col its location.
*
* If there are multiple coefficients with the same extreme value, the location of the first instance is returned.
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visit(), DenseBase::maxCoeff()
*/
template <typename Derived>
template <int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar DenseBase<Derived>::maxCoeff(IndexType* rowPtr,
IndexType* colPtr) const {
using Func = internal::max_coeff_functor<Scalar, NaNPropagation>;
Func func;
return internal::findCoeff(derived(), func, rowPtr, colPtr);
}
/** \returns the maximum of all coefficients of *this and puts in *index its location.
*
* If there are multiple coefficients with the same extreme value, the location of the first instance is returned.
*
* In case \c *this contains NaN, NaNPropagation determines the behavior:
* NaNPropagation == PropagateFast : undefined
* NaNPropagation == PropagateNaN : result is NaN
* NaNPropagation == PropagateNumbers : result is maximum of elements that are not NaN
* \warning the matrix must be not empty, otherwise an assertion is triggered.
*
* \sa DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(),
* DenseBase::maxCoeff()
*/
template <typename Derived>
template <int NaNPropagation, typename IndexType>
EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar DenseBase<Derived>::maxCoeff(IndexType* indexPtr) const {
using Func = internal::max_coeff_functor<Scalar, NaNPropagation>;
Func func;
return internal::findCoeff(derived(), func, indexPtr);
}
} // namespace Eigen
#endif // EIGEN_FIND_COEFF_H

View File

@@ -10,137 +10,99 @@
#ifndef EIGEN_FORCEALIGNEDACCESS_H
#define EIGEN_FORCEALIGNEDACCESS_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
/** \class ForceAlignedAccess
* \ingroup Core_Module
*
* \brief Enforce aligned packet loads and stores regardless of what is requested
*
* \param ExpressionType the type of the object of which we are forcing aligned packet access
*
* This class is the return type of MatrixBase::forceAlignedAccess()
* and most of the time this is the only way it is used.
*
* \sa MatrixBase::forceAlignedAccess()
*/
* \ingroup Core_Module
*
* \brief Enforce aligned packet loads and stores regardless of what is requested
*
* \param ExpressionType the type of the object of which we are forcing aligned packet access
*
* This class is the return type of MatrixBase::forceAlignedAccess()
* and most of the time this is the only way it is used.
*
* \sa MatrixBase::forceAlignedAccess()
*/
namespace internal {
template<typename ExpressionType>
struct traits<ForceAlignedAccess<ExpressionType> > : public traits<ExpressionType>
{};
}
template <typename ExpressionType>
struct traits<ForceAlignedAccess<ExpressionType>> : public traits<ExpressionType> {};
} // namespace internal
template<typename ExpressionType> class ForceAlignedAccess
: public internal::dense_xpr_base< ForceAlignedAccess<ExpressionType> >::type
{
public:
template <typename ExpressionType>
class ForceAlignedAccess : public internal::dense_xpr_base<ForceAlignedAccess<ExpressionType>>::type {
public:
typedef typename internal::dense_xpr_base<ForceAlignedAccess>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ForceAlignedAccess)
typedef typename internal::dense_xpr_base<ForceAlignedAccess>::type Base;
EIGEN_DENSE_PUBLIC_INTERFACE(ForceAlignedAccess)
EIGEN_DEVICE_FUNC explicit constexpr ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC explicit inline ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {}
EIGEN_DEVICE_FUNC constexpr Index rows() const noexcept { return m_expression.rows(); }
EIGEN_DEVICE_FUNC constexpr Index cols() const noexcept { return m_expression.cols(); }
EIGEN_DEVICE_FUNC constexpr Index outerStride() const noexcept { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC constexpr Index innerStride() const noexcept { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC inline Index rows() const { return m_expression.rows(); }
EIGEN_DEVICE_FUNC inline Index cols() const { return m_expression.cols(); }
EIGEN_DEVICE_FUNC inline Index outerStride() const { return m_expression.outerStride(); }
EIGEN_DEVICE_FUNC inline Index innerStride() const { return m_expression.innerStride(); }
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const {
return m_expression.coeff(row, col);
}
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index row, Index col) const
{
return m_expression.coeff(row, col);
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col) {
return m_expression.const_cast_derived().coeffRef(row, col);
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index row, Index col)
{
return m_expression.const_cast_derived().coeffRef(row, col);
}
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const { return m_expression.coeff(index); }
EIGEN_DEVICE_FUNC inline const CoeffReturnType coeff(Index index) const
{
return m_expression.coeff(index);
}
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index) { return m_expression.const_cast_derived().coeffRef(index); }
EIGEN_DEVICE_FUNC inline Scalar& coeffRef(Index index)
{
return m_expression.const_cast_derived().coeffRef(index);
}
template <int LoadMode>
inline const PacketScalar packet(Index row, Index col) const {
return m_expression.template packet<Aligned>(row, col);
}
template<int LoadMode>
inline const PacketScalar packet(Index row, Index col) const
{
return m_expression.template packet<Aligned>(row, col);
}
template <int LoadMode>
inline void writePacket(Index row, Index col, const PacketScalar& x) {
m_expression.const_cast_derived().template writePacket<Aligned>(row, col, x);
}
template<int LoadMode>
inline void writePacket(Index row, Index col, const PacketScalar& x)
{
m_expression.const_cast_derived().template writePacket<Aligned>(row, col, x);
}
template <int LoadMode>
inline const PacketScalar packet(Index index) const {
return m_expression.template packet<Aligned>(index);
}
template<int LoadMode>
inline const PacketScalar packet(Index index) const
{
return m_expression.template packet<Aligned>(index);
}
template <int LoadMode>
inline void writePacket(Index index, const PacketScalar& x) {
m_expression.const_cast_derived().template writePacket<Aligned>(index, x);
}
template<int LoadMode>
inline void writePacket(Index index, const PacketScalar& x)
{
m_expression.const_cast_derived().template writePacket<Aligned>(index, x);
}
EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; }
EIGEN_DEVICE_FUNC operator const ExpressionType&() const { return m_expression; }
protected:
const ExpressionType& m_expression;
protected:
const ExpressionType& m_expression;
private:
ForceAlignedAccess& operator=(const ForceAlignedAccess&);
private:
ForceAlignedAccess& operator=(const ForceAlignedAccess&);
};
/** \returns an expression of *this with forced aligned access
* \sa forceAlignedAccessIf(),class ForceAlignedAccess
*/
template<typename Derived>
inline const ForceAlignedAccess<Derived>
MatrixBase<Derived>::forceAlignedAccess() const
{
* \sa forceAlignedAccessIf(),class ForceAlignedAccess
*/
template <typename Derived>
inline const ForceAlignedAccess<Derived> MatrixBase<Derived>::forceAlignedAccess() const {
return ForceAlignedAccess<Derived>(derived());
}
/** \returns an expression of *this with forced aligned access
* \sa forceAlignedAccessIf(), class ForceAlignedAccess
*/
template<typename Derived>
inline ForceAlignedAccess<Derived>
MatrixBase<Derived>::forceAlignedAccess()
{
* \sa forceAlignedAccessIf(), class ForceAlignedAccess
*/
template <typename Derived>
inline ForceAlignedAccess<Derived> MatrixBase<Derived>::forceAlignedAccess() {
return ForceAlignedAccess<Derived>(derived());
}
/** \returns an expression of *this with forced aligned access if \a Enable is true.
* \sa forceAlignedAccess(), class ForceAlignedAccess
*/
template<typename Derived>
template<bool Enable>
inline typename internal::add_const_on_value_type<typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type>::type
MatrixBase<Derived>::forceAlignedAccessIf() const
{
return derived(); // FIXME This should not work but apparently is never used
}
} // end namespace Eigen
/** \returns an expression of *this with forced aligned access if \a Enable is true.
* \sa forceAlignedAccess(), class ForceAlignedAccess
*/
template<typename Derived>
template<bool Enable>
inline typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type
MatrixBase<Derived>::forceAlignedAccessIf()
{
return derived(); // FIXME This should not work but apparently is never used
}
} // end namespace Eigen
#endif // EIGEN_FORCEALIGNEDACCESS_H
#endif // EIGEN_FORCEALIGNEDACCESS_H

View File

@@ -11,145 +11,122 @@
#ifndef EIGEN_FUZZY_H
#define EIGEN_FUZZY_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace internal
{
namespace Eigen {
template<typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isApprox_selector
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec)
{
typename internal::nested_eval<Derived,2>::type nested(x);
typename internal::nested_eval<OtherDerived,2>::type otherNested(y);
return (nested - otherNested).cwiseAbs2().sum() <= prec * prec * numext::mini(nested.cwiseAbs2().sum(), otherNested.cwiseAbs2().sum());
namespace internal {
template <typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isApprox_selector {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec) {
typename internal::nested_eval<Derived, 2>::type nested(x);
typename internal::nested_eval<OtherDerived, 2>::type otherNested(y);
return (nested.matrix() - otherNested.matrix()).cwiseAbs2().sum() <=
prec * prec * numext::mini(nested.cwiseAbs2().sum(), otherNested.cwiseAbs2().sum());
}
};
template<typename Derived, typename OtherDerived>
struct isApprox_selector<Derived, OtherDerived, true>
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar&)
{
template <typename Derived, typename OtherDerived>
struct isApprox_selector<Derived, OtherDerived, true> {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar&) {
return x.matrix() == y.matrix();
}
};
template<typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isMuchSmallerThan_object_selector
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec)
{
template <typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isMuchSmallerThan_object_selector {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec) {
return x.cwiseAbs2().sum() <= numext::abs2(prec) * y.cwiseAbs2().sum();
}
};
template<typename Derived, typename OtherDerived>
struct isMuchSmallerThan_object_selector<Derived, OtherDerived, true>
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const OtherDerived&, const typename Derived::RealScalar&)
{
template <typename Derived, typename OtherDerived>
struct isMuchSmallerThan_object_selector<Derived, OtherDerived, true> {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const OtherDerived&, const typename Derived::RealScalar&) {
return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix();
}
};
template<typename Derived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isMuchSmallerThan_scalar_selector
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const typename Derived::RealScalar& y, const typename Derived::RealScalar& prec)
{
template <typename Derived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
struct isMuchSmallerThan_scalar_selector {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const typename Derived::RealScalar& y,
const typename Derived::RealScalar& prec) {
return x.cwiseAbs2().sum() <= numext::abs2(prec * y);
}
};
template<typename Derived>
struct isMuchSmallerThan_scalar_selector<Derived, true>
{
EIGEN_DEVICE_FUNC
static bool run(const Derived& x, const typename Derived::RealScalar&, const typename Derived::RealScalar&)
{
template <typename Derived>
struct isMuchSmallerThan_scalar_selector<Derived, true> {
EIGEN_DEVICE_FUNC static bool run(const Derived& x, const typename Derived::RealScalar&,
const typename Derived::RealScalar&) {
return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix();
}
};
} // end namespace internal
} // end namespace internal
/** \returns \c true if \c *this is approximately equal to \a other, within the precision
* determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. Two vectors \f$ v \f$ and \f$ w \f$
* are considered to be approximately equal within precision \f$ p \f$ if
* \f[ \Vert v - w \Vert \leqslant p\,\min(\Vert v\Vert, \Vert w\Vert). \f]
* For matrices, the comparison is done using the Hilbert-Schmidt norm (aka Frobenius norm
* L2 norm).
*
* \note Because of the multiplicativeness of this comparison, one can't use this function
* to check whether \c *this is approximately equal to the zero matrix or vector.
* Indeed, \c isApprox(zero) returns false unless \c *this itself is exactly the zero matrix
* or vector. If you want to test whether \c *this is zero, use internal::isMuchSmallerThan(const
* RealScalar&, RealScalar) instead.
*
* \sa internal::isMuchSmallerThan(const RealScalar&, RealScalar) const
*/
template<typename Derived>
template<typename OtherDerived>
bool DenseBase<Derived>::isApprox(
const DenseBase<OtherDerived>& other,
const RealScalar& prec
) const
{
* determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. Two vectors \f$ v \f$ and \f$ w \f$
* are considered to be approximately equal within precision \f$ p \f$ if
* \f[ \Vert v - w \Vert \leqslant p\,\min(\Vert v\Vert, \Vert w\Vert). \f]
* For matrices, the comparison is done using the Hilbert-Schmidt norm (aka Frobenius norm
* L2 norm).
*
* \note Because of the multiplicativeness of this comparison, one can't use this function
* to check whether \c *this is approximately equal to the zero matrix or vector.
* Indeed, \c isApprox(zero) returns false unless \c *this itself is exactly the zero matrix
* or vector. If you want to test whether \c *this is zero, use internal::isMuchSmallerThan(const
* RealScalar&, RealScalar) instead.
*
* \sa internal::isMuchSmallerThan(const RealScalar&, RealScalar) const
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr bool DenseBase<Derived>::isApprox(const DenseBase<OtherDerived>& other,
const RealScalar& prec) const {
return internal::isApprox_selector<Derived, OtherDerived>::run(derived(), other.derived(), prec);
}
/** \returns \c true if the norm of \c *this is much smaller than \a other,
* within the precision determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
* considered to be much smaller than \f$ x \f$ within precision \f$ p \f$ if
* \f[ \Vert v \Vert \leqslant p\,\vert x\vert. \f]
*
* For matrices, the comparison is done using the Hilbert-Schmidt norm. For this reason,
* the value of the reference scalar \a other should come from the Hilbert-Schmidt norm
* of a reference matrix of same dimensions.
*
* \sa isApprox(), isMuchSmallerThan(const DenseBase<OtherDerived>&, RealScalar) const
*/
template<typename Derived>
bool DenseBase<Derived>::isMuchSmallerThan(
const typename NumTraits<Scalar>::Real& other,
const RealScalar& prec
) const
{
* within the precision determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
* considered to be much smaller than \f$ x \f$ within precision \f$ p \f$ if
* \f[ \Vert v \Vert \leqslant p\,\vert x\vert. \f]
*
* For matrices, the comparison is done using the Hilbert-Schmidt norm. For this reason,
* the value of the reference scalar \a other should come from the Hilbert-Schmidt norm
* of a reference matrix of same dimensions.
*
* \sa isApprox(), isMuchSmallerThan(const DenseBase<OtherDerived>&, RealScalar) const
*/
template <typename Derived>
EIGEN_DEVICE_FUNC constexpr bool DenseBase<Derived>::isMuchSmallerThan(const typename NumTraits<Scalar>::Real& other,
const RealScalar& prec) const {
return internal::isMuchSmallerThan_scalar_selector<Derived>::run(derived(), other, prec);
}
/** \returns \c true if the norm of \c *this is much smaller than the norm of \a other,
* within the precision determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
* considered to be much smaller than a vector \f$ w \f$ within precision \f$ p \f$ if
* \f[ \Vert v \Vert \leqslant p\,\Vert w\Vert. \f]
* For matrices, the comparison is done using the Hilbert-Schmidt norm.
*
* \sa isApprox(), isMuchSmallerThan(const RealScalar&, RealScalar) const
*/
template<typename Derived>
template<typename OtherDerived>
bool DenseBase<Derived>::isMuchSmallerThan(
const DenseBase<OtherDerived>& other,
const RealScalar& prec
) const
{
* within the precision determined by \a prec.
*
* \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
* considered to be much smaller than a vector \f$ w \f$ within precision \f$ p \f$ if
* \f[ \Vert v \Vert \leqslant p\,\Vert w\Vert. \f]
* For matrices, the comparison is done using the Hilbert-Schmidt norm.
*
* \sa isApprox(), isMuchSmallerThan(const RealScalar&, RealScalar) const
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC constexpr bool DenseBase<Derived>::isMuchSmallerThan(const DenseBase<OtherDerived>& other,
const RealScalar& prec) const {
return internal::isMuchSmallerThan_object_selector<Derived, OtherDerived>::run(derived(), other.derived(), prec);
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_FUZZY_H
#endif // EIGEN_FUZZY_H

View File

@@ -11,67 +11,77 @@
#ifndef EIGEN_GENERAL_PRODUCT_H
#define EIGEN_GENERAL_PRODUCT_H
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
enum {
Large = 2,
Small = 3
};
enum { Large = 2, Small = 3 };
// Define the threshold value to fallback from the generic matrix-matrix product
// implementation (heavy) to the lightweight coeff-based product one.
// See generic_product_impl<Lhs,Rhs,DenseShape,DenseShape,GemmProduct>
// in products/GeneralMatrixMatrix.h for more details.
// TODO This threshold should also be used in the compile-time selector below.
#ifndef EIGEN_GEMM_TO_COEFFBASED_THRESHOLD
// This default value has been obtained on a Haswell architecture.
#define EIGEN_GEMM_TO_COEFFBASED_THRESHOLD 20
#endif
namespace internal {
template<int Rows, int Cols, int Depth> struct product_type_selector;
template <int Rows, int Cols, int Depth>
struct product_type_selector;
template<int Size, int MaxSize> struct product_size_category
{
enum { is_large = MaxSize == Dynamic ||
Size >= EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD,
value = is_large ? Large
: Size == 1 ? 1
: Small
template <int Size, int MaxSize>
struct product_size_category {
enum {
#ifndef EIGEN_GPU_COMPILE_PHASE
is_large = MaxSize == Dynamic || Size >= EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD ||
(Size == Dynamic && MaxSize >= EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD),
#else
is_large = 0,
#endif
value = is_large ? Large
: Size == 1 ? 1
: Small
};
};
template<typename Lhs, typename Rhs> struct product_type
{
typedef typename remove_all<Lhs>::type _Lhs;
typedef typename remove_all<Rhs>::type _Rhs;
template <typename Lhs, typename Rhs>
struct product_type {
typedef remove_all_t<Lhs> Lhs_;
typedef remove_all_t<Rhs> Rhs_;
enum {
MaxRows = traits<_Lhs>::MaxRowsAtCompileTime,
Rows = traits<_Lhs>::RowsAtCompileTime,
MaxCols = traits<_Rhs>::MaxColsAtCompileTime,
Cols = traits<_Rhs>::ColsAtCompileTime,
MaxDepth = EIGEN_SIZE_MIN_PREFER_FIXED(traits<_Lhs>::MaxColsAtCompileTime,
traits<_Rhs>::MaxRowsAtCompileTime),
Depth = EIGEN_SIZE_MIN_PREFER_FIXED(traits<_Lhs>::ColsAtCompileTime,
traits<_Rhs>::RowsAtCompileTime)
MaxRows = traits<Lhs_>::MaxRowsAtCompileTime,
Rows = traits<Lhs_>::RowsAtCompileTime,
MaxCols = traits<Rhs_>::MaxColsAtCompileTime,
Cols = traits<Rhs_>::ColsAtCompileTime,
MaxDepth = min_size_prefer_fixed(traits<Lhs_>::MaxColsAtCompileTime, traits<Rhs_>::MaxRowsAtCompileTime),
Depth = min_size_prefer_fixed(traits<Lhs_>::ColsAtCompileTime, traits<Rhs_>::RowsAtCompileTime)
};
// the splitting into different lines of code here, introducing the _select enums and the typedef below,
// is to work around an internal compiler error with gcc 4.1 and 4.2.
private:
private:
enum {
rows_select = product_size_category<Rows,MaxRows>::value,
cols_select = product_size_category<Cols,MaxCols>::value,
depth_select = product_size_category<Depth,MaxDepth>::value
rows_select = product_size_category<Rows, MaxRows>::value,
cols_select = product_size_category<Cols, MaxCols>::value,
depth_select = product_size_category<Depth, MaxDepth>::value
};
typedef product_type_selector<rows_select, cols_select, depth_select> selector;
public:
enum {
value = selector::ret,
ret = selector::ret
};
public:
enum { value = selector::ret, ret = selector::ret };
#ifdef EIGEN_DEBUG_PRODUCT
static void debug()
{
EIGEN_DEBUG_VAR(Rows);
EIGEN_DEBUG_VAR(Cols);
EIGEN_DEBUG_VAR(Depth);
EIGEN_DEBUG_VAR(rows_select);
EIGEN_DEBUG_VAR(cols_select);
EIGEN_DEBUG_VAR(depth_select);
EIGEN_DEBUG_VAR(value);
static void debug() {
EIGEN_DEBUG_VAR(Rows);
EIGEN_DEBUG_VAR(Cols);
EIGEN_DEBUG_VAR(Depth);
EIGEN_DEBUG_VAR(rows_select);
EIGEN_DEBUG_VAR(cols_select);
EIGEN_DEBUG_VAR(depth_select);
EIGEN_DEBUG_VAR(value);
}
#endif
};
@@ -79,54 +89,125 @@ public:
/* The following allows to select the kind of product at compile time
* based on the three dimensions of the product.
* This is a compile time mapping from {1,Small,Large}^3 -> {product types} */
// FIXME I'm not sure the current mapping is the ideal one.
template<int M, int N> struct product_type_selector<M,N,1> { enum { ret = OuterProduct }; };
template<int M> struct product_type_selector<M, 1, 1> { enum { ret = LazyCoeffBasedProductMode }; };
template<int N> struct product_type_selector<1, N, 1> { enum { ret = LazyCoeffBasedProductMode }; };
template<int Depth> struct product_type_selector<1, 1, Depth> { enum { ret = InnerProduct }; };
template<> struct product_type_selector<1, 1, 1> { enum { ret = InnerProduct }; };
template<> struct product_type_selector<Small,1, Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<1, Small,Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Small,Small,Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Small, Small, 1> { enum { ret = LazyCoeffBasedProductMode }; };
template<> struct product_type_selector<Small, Large, 1> { enum { ret = LazyCoeffBasedProductMode }; };
template<> struct product_type_selector<Large, Small, 1> { enum { ret = LazyCoeffBasedProductMode }; };
template<> struct product_type_selector<1, Large,Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<1, Large,Large> { enum { ret = GemvProduct }; };
template<> struct product_type_selector<1, Small,Large> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Large,1, Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Large,1, Large> { enum { ret = GemvProduct }; };
template<> struct product_type_selector<Small,1, Large> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Small,Small,Large> { enum { ret = GemmProduct }; };
template<> struct product_type_selector<Large,Small,Large> { enum { ret = GemmProduct }; };
template<> struct product_type_selector<Small,Large,Large> { enum { ret = GemmProduct }; };
template<> struct product_type_selector<Large,Large,Large> { enum { ret = GemmProduct }; };
template<> struct product_type_selector<Large,Small,Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Small,Large,Small> { enum { ret = CoeffBasedProductMode }; };
template<> struct product_type_selector<Large,Large,Small> { enum { ret = GemmProduct }; };
// FIXME: the current compile-time product-type mapping may not be optimal.
template <int M, int N>
struct product_type_selector<M, N, 1> {
enum { ret = OuterProduct };
};
template <int M>
struct product_type_selector<M, 1, 1> {
enum { ret = LazyCoeffBasedProductMode };
};
template <int N>
struct product_type_selector<1, N, 1> {
enum { ret = LazyCoeffBasedProductMode };
};
template <int Depth>
struct product_type_selector<1, 1, Depth> {
enum { ret = InnerProduct };
};
template <>
struct product_type_selector<1, 1, 1> {
enum { ret = InnerProduct };
};
template <>
struct product_type_selector<Small, 1, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<1, Small, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Small, Small, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Small, Small, 1> {
enum { ret = LazyCoeffBasedProductMode };
};
template <>
struct product_type_selector<Small, Large, 1> {
enum { ret = LazyCoeffBasedProductMode };
};
template <>
struct product_type_selector<Large, Small, 1> {
enum { ret = LazyCoeffBasedProductMode };
};
template <>
struct product_type_selector<1, Large, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<1, Large, Large> {
enum { ret = GemvProduct };
};
template <>
struct product_type_selector<1, Small, Large> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Large, 1, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Large, 1, Large> {
enum { ret = GemvProduct };
};
template <>
struct product_type_selector<Small, 1, Large> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Small, Small, Large> {
enum { ret = GemmProduct };
};
template <>
struct product_type_selector<Large, Small, Large> {
enum { ret = GemmProduct };
};
template <>
struct product_type_selector<Small, Large, Large> {
enum { ret = GemmProduct };
};
template <>
struct product_type_selector<Large, Large, Large> {
enum { ret = GemmProduct };
};
template <>
struct product_type_selector<Large, Small, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Small, Large, Small> {
enum { ret = CoeffBasedProductMode };
};
template <>
struct product_type_selector<Large, Large, Small> {
enum { ret = GemmProduct };
};
} // end namespace internal
} // end namespace internal
/***********************************************************************
* Implementation of Inner Vector Vector Product
***********************************************************************/
* Implementation of Inner Vector Vector Product
***********************************************************************/
// FIXME : maybe the "inner product" could return a Scalar
// instead of a 1x1 matrix ??
// Pro: more natural for the user
// Cons: this could be a problem if in a meta unrolled algorithm a matrix-matrix
// product ends up to a row-vector times col-vector product... To tackle this use
// case, we could have a specialization for Block<MatrixType,1,1> with: operator=(Scalar x);
// FIXME: consider returning a Scalar instead of a 1x1 matrix for inner products.
// Pro: more natural for the user.
// Con: in a meta-unrolled algorithm a matrix-matrix product may reduce to a
// row-vector times column-vector product. To handle this, we could specialize
// Block<MatrixType,1,1> with operator=(Scalar x).
/***********************************************************************
* Implementation of Outer Vector Vector Product
***********************************************************************/
* Implementation of Outer Vector Vector Product
***********************************************************************/
/***********************************************************************
* Implementation of General Matrix Vector Product
***********************************************************************/
* Implementation of General Matrix Vector Product
***********************************************************************/
/* According to the shape/flags of the matrix we have to distinghish 3 different cases:
/* According to the shape/flags of the matrix we have to distinguish 3 different cases:
* 1 - the matrix is col-major, BLAS compatible and M is large => call fast BLAS-like colmajor routine
* 2 - the matrix is row-major, BLAS compatible and N is large => call fast BLAS-like rowmajor routine
* 3 - all other cases are handled using a simple loop along the outer-storage direction.
@@ -135,302 +216,303 @@ template<> struct product_type_selector<Large,Large,Small> { enum
*/
namespace internal {
template<int Side, int StorageOrder, bool BlasCompatible>
template <int Side, int StorageOrder, bool BlasCompatible>
struct gemv_dense_selector;
} // end namespace internal
} // end namespace internal
namespace internal {
template<typename Scalar,int Size,int MaxSize,bool Cond> struct gemv_static_vector_if;
template <typename Scalar, int Size, int MaxSize, bool Cond>
struct gemv_static_vector_if;
template<typename Scalar,int Size,int MaxSize>
struct gemv_static_vector_if<Scalar,Size,MaxSize,false>
{
EIGEN_STRONG_INLINE Scalar* data() { eigen_internal_assert(false && "should never be called"); return 0; }
template <typename Scalar, int Size, int MaxSize>
struct gemv_static_vector_if<Scalar, Size, MaxSize, false> {
EIGEN_DEVICE_FUNC constexpr Scalar* data() {
eigen_internal_assert(false && "should never be called");
return 0;
}
};
template<typename Scalar,int Size>
struct gemv_static_vector_if<Scalar,Size,Dynamic,true>
{
EIGEN_STRONG_INLINE Scalar* data() { return 0; }
template <typename Scalar, int Size>
struct gemv_static_vector_if<Scalar, Size, Dynamic, true> {
EIGEN_DEVICE_FUNC constexpr Scalar* data() { return 0; }
};
template<typename Scalar,int Size,int MaxSize>
struct gemv_static_vector_if<Scalar,Size,MaxSize,true>
{
enum {
ForceAlignment = internal::packet_traits<Scalar>::Vectorizable,
PacketSize = internal::packet_traits<Scalar>::size
};
#if EIGEN_MAX_STATIC_ALIGN_BYTES!=0
internal::plain_array<Scalar,EIGEN_SIZE_MIN_PREFER_FIXED(Size,MaxSize),0,EIGEN_PLAIN_ENUM_MIN(AlignedMax,PacketSize)> m_data;
EIGEN_STRONG_INLINE Scalar* data() { return m_data.array; }
#else
template <typename Scalar, int Size, int MaxSize>
struct gemv_static_vector_if<Scalar, Size, MaxSize, true> {
#if EIGEN_MAX_STATIC_ALIGN_BYTES != 0
internal::plain_array<Scalar, internal::min_size_prefer_fixed(Size, MaxSize), 0, AlignedMax> m_data;
constexpr Scalar* data() { return m_data.array; }
#else
// Some architectures cannot align on the stack,
// => let's manually enforce alignment by allocating more data and return the address of the first aligned element.
internal::plain_array<Scalar,EIGEN_SIZE_MIN_PREFER_FIXED(Size,MaxSize)+(ForceAlignment?EIGEN_MAX_ALIGN_BYTES:0),0> m_data;
EIGEN_STRONG_INLINE Scalar* data() {
return ForceAlignment
? reinterpret_cast<Scalar*>((internal::UIntPtr(m_data.array) & ~(std::size_t(EIGEN_MAX_ALIGN_BYTES-1))) + EIGEN_MAX_ALIGN_BYTES)
: m_data.array;
internal::plain_array<Scalar, internal::min_size_prefer_fixed(Size, MaxSize) + EIGEN_MAX_ALIGN_BYTES, 0> m_data;
constexpr Scalar* data() {
return reinterpret_cast<Scalar*>((std::uintptr_t(m_data.array) & ~(std::size_t(EIGEN_MAX_ALIGN_BYTES - 1))) +
EIGEN_MAX_ALIGN_BYTES);
}
#endif
#endif
};
// The vector is on the left => transposition
template<int StorageOrder, bool BlasCompatible>
struct gemv_dense_selector<OnTheLeft,StorageOrder,BlasCompatible>
{
template<typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha)
{
template <int StorageOrder, bool BlasCompatible>
struct gemv_dense_selector<OnTheLeft, StorageOrder, BlasCompatible> {
template <typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs& lhs, const Rhs& rhs, Dest& dest, const typename Dest::Scalar& alpha) {
Transpose<Dest> destT(dest);
enum { OtherStorageOrder = StorageOrder == RowMajor ? ColMajor : RowMajor };
gemv_dense_selector<OnTheRight,OtherStorageOrder,BlasCompatible>
::run(rhs.transpose(), lhs.transpose(), destT, alpha);
gemv_dense_selector<OnTheRight, OtherStorageOrder, BlasCompatible>::run(rhs.transpose(), lhs.transpose(), destT,
alpha);
}
};
template<> struct gemv_dense_selector<OnTheRight,ColMajor,true>
{
template<typename Lhs, typename Rhs, typename Dest>
static inline void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha)
{
typedef typename Lhs::Scalar LhsScalar;
typedef typename Rhs::Scalar RhsScalar;
typedef typename Dest::Scalar ResScalar;
typedef typename Dest::RealScalar RealScalar;
template <>
struct gemv_dense_selector<OnTheRight, ColMajor, true> {
template <typename Lhs, typename Rhs, typename Dest>
static inline void run(const Lhs& lhs, const Rhs& rhs, Dest& dest, const typename Dest::Scalar& alpha) {
typedef typename Lhs::Scalar LhsScalar;
typedef typename Rhs::Scalar RhsScalar;
typedef typename Dest::Scalar ResScalar;
typedef internal::blas_traits<Lhs> LhsBlasTraits;
typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType;
typedef internal::blas_traits<Rhs> RhsBlasTraits;
typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType;
typedef Map<Matrix<ResScalar,Dynamic,1>, EIGEN_PLAIN_ENUM_MIN(AlignedMax,internal::packet_traits<ResScalar>::size)> MappedDest;
typedef Map<Matrix<ResScalar, Dynamic, 1>, plain_enum_min(AlignedMax, internal::packet_traits<ResScalar>::size)>
MappedDest;
ActualLhsType actualLhs = LhsBlasTraits::extract(lhs);
ActualRhsType actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
// make sure Dest is a compile-time vector type (bug 1166)
typedef typename conditional<Dest::IsVectorAtCompileTime, Dest, typename Dest::ColXpr>::type ActualDest;
typedef std::conditional_t<Dest::IsVectorAtCompileTime, Dest, typename Dest::ColXpr> ActualDest;
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
// FIXME: find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
// on, the other hand it is good for the cache to pack the vector anyways...
EvalToDestAtCompileTime = (ActualDest::InnerStrideAtCompileTime==1),
EvalToDestAtCompileTime = (ActualDest::InnerStrideAtCompileTime == 1),
ComplexByReal = (NumTraits<LhsScalar>::IsComplex) && (!NumTraits<RhsScalar>::IsComplex),
MightCannotUseDest = (ActualDest::InnerStrideAtCompileTime!=1) || ComplexByReal
MightCannotUseDest = ((!EvalToDestAtCompileTime) || ComplexByReal) && (ActualDest::MaxSizeAtCompileTime != 0)
};
gemv_static_vector_if<ResScalar,ActualDest::SizeAtCompileTime,ActualDest::MaxSizeAtCompileTime,MightCannotUseDest> static_dest;
typedef const_blas_data_mapper<LhsScalar, Index, ColMajor> LhsMapper;
typedef const_blas_data_mapper<RhsScalar, Index, RowMajor> RhsMapper;
RhsScalar compatibleAlpha = get_factor<ResScalar, RhsScalar>::run(actualAlpha);
const bool alphaIsCompatible = (!ComplexByReal) || (numext::imag(actualAlpha)==RealScalar(0));
const bool evalToDest = EvalToDestAtCompileTime && alphaIsCompatible;
if (!MightCannotUseDest) {
// shortcut if we are sure to be able to use dest directly,
// this ease the compiler to generate cleaner and more optimzized code for most common cases
general_matrix_vector_product<Index, LhsScalar, LhsMapper, ColMajor, LhsBlasTraits::NeedToConjugate, RhsScalar,
RhsMapper, RhsBlasTraits::NeedToConjugate>::run(actualLhs.rows(), actualLhs.cols(),
LhsMapper(actualLhs.data(),
actualLhs.outerStride()),
RhsMapper(actualRhs.data(),
actualRhs.innerStride()),
dest.data(), 1, compatibleAlpha);
} else {
gemv_static_vector_if<ResScalar, ActualDest::SizeAtCompileTime, ActualDest::MaxSizeAtCompileTime,
MightCannotUseDest>
static_dest;
RhsScalar compatibleAlpha = get_factor<ResScalar,RhsScalar>::run(actualAlpha);
const bool alphaIsCompatible = (!ComplexByReal) || (numext::is_exactly_zero(numext::imag(actualAlpha)));
const bool evalToDest = EvalToDestAtCompileTime && alphaIsCompatible;
ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(),
evalToDest ? dest.data() : static_dest.data());
ei_declare_aligned_stack_constructed_variable(ResScalar, actualDestPtr, dest.size(),
evalToDest ? dest.data() : static_dest.data());
if(!evalToDest)
{
#ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
Index size = dest.size();
EIGEN_DENSE_STORAGE_CTOR_PLUGIN
#endif
if(!alphaIsCompatible)
{
MappedDest(actualDestPtr, dest.size()).setZero();
compatibleAlpha = RhsScalar(1);
if (!evalToDest) {
#ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
constexpr int Size = Dest::SizeAtCompileTime;
Index size = dest.size();
EIGEN_DENSE_STORAGE_CTOR_PLUGIN
#endif
if (!alphaIsCompatible) {
MappedDest(actualDestPtr, dest.size()).setZero();
compatibleAlpha = RhsScalar(1);
} else
MappedDest(actualDestPtr, dest.size()) = dest;
}
else
MappedDest(actualDestPtr, dest.size()) = dest;
}
typedef const_blas_data_mapper<LhsScalar,Index,ColMajor> LhsMapper;
typedef const_blas_data_mapper<RhsScalar,Index,RowMajor> RhsMapper;
general_matrix_vector_product
<Index,LhsScalar,LhsMapper,ColMajor,LhsBlasTraits::NeedToConjugate,RhsScalar,RhsMapper,RhsBlasTraits::NeedToConjugate>::run(
actualLhs.rows(), actualLhs.cols(),
LhsMapper(actualLhs.data(), actualLhs.outerStride()),
RhsMapper(actualRhs.data(), actualRhs.innerStride()),
actualDestPtr, 1,
compatibleAlpha);
general_matrix_vector_product<Index, LhsScalar, LhsMapper, ColMajor, LhsBlasTraits::NeedToConjugate, RhsScalar,
RhsMapper, RhsBlasTraits::NeedToConjugate>::run(actualLhs.rows(), actualLhs.cols(),
LhsMapper(actualLhs.data(),
actualLhs.outerStride()),
RhsMapper(actualRhs.data(),
actualRhs.innerStride()),
actualDestPtr, 1, compatibleAlpha);
if (!evalToDest)
{
if(!alphaIsCompatible)
dest += actualAlpha * MappedDest(actualDestPtr, dest.size());
else
dest = MappedDest(actualDestPtr, dest.size());
if (!evalToDest) {
if (!alphaIsCompatible)
dest.matrix() += actualAlpha * MappedDest(actualDestPtr, dest.size());
else
dest = MappedDest(actualDestPtr, dest.size());
}
}
}
};
template<> struct gemv_dense_selector<OnTheRight,RowMajor,true>
{
template<typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha)
{
typedef typename Lhs::Scalar LhsScalar;
typedef typename Rhs::Scalar RhsScalar;
typedef typename Dest::Scalar ResScalar;
template <>
struct gemv_dense_selector<OnTheRight, RowMajor, true> {
template <typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs& lhs, const Rhs& rhs, Dest& dest, const typename Dest::Scalar& alpha) {
typedef typename Lhs::Scalar LhsScalar;
typedef typename Rhs::Scalar RhsScalar;
typedef typename Dest::Scalar ResScalar;
typedef internal::blas_traits<Lhs> LhsBlasTraits;
typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType;
typedef internal::blas_traits<Rhs> RhsBlasTraits;
typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType;
typedef typename internal::remove_all<ActualRhsType>::type ActualRhsTypeCleaned;
typedef internal::remove_all_t<ActualRhsType> ActualRhsTypeCleaned;
typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(lhs);
typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(rhs);
std::add_const_t<ActualLhsType> actualLhs = LhsBlasTraits::extract(lhs);
std::add_const_t<ActualRhsType> actualRhs = RhsBlasTraits::extract(rhs);
ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(lhs)
* RhsBlasTraits::extractScalarFactor(rhs);
ResScalar actualAlpha = combine_scalar_factors(alpha, lhs, rhs);
enum {
// FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
// FIXME: find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
// on, the other hand it is good for the cache to pack the vector anyways...
DirectlyUseRhs = ActualRhsTypeCleaned::InnerStrideAtCompileTime==1
DirectlyUseRhs =
ActualRhsTypeCleaned::InnerStrideAtCompileTime == 1 || ActualRhsTypeCleaned::MaxSizeAtCompileTime == 0
};
gemv_static_vector_if<RhsScalar,ActualRhsTypeCleaned::SizeAtCompileTime,ActualRhsTypeCleaned::MaxSizeAtCompileTime,!DirectlyUseRhs> static_rhs;
gemv_static_vector_if<RhsScalar, ActualRhsTypeCleaned::SizeAtCompileTime,
ActualRhsTypeCleaned::MaxSizeAtCompileTime, !DirectlyUseRhs>
static_rhs;
ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,actualRhs.size(),
ei_declare_aligned_stack_constructed_variable(
RhsScalar, actualRhsPtr, actualRhs.size(),
DirectlyUseRhs ? const_cast<RhsScalar*>(actualRhs.data()) : static_rhs.data());
if(!DirectlyUseRhs)
{
#ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
if (!DirectlyUseRhs) {
#ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
constexpr int Size = ActualRhsTypeCleaned::SizeAtCompileTime;
Index size = actualRhs.size();
EIGEN_DENSE_STORAGE_CTOR_PLUGIN
#endif
#endif
Map<typename ActualRhsTypeCleaned::PlainObject>(actualRhsPtr, actualRhs.size()) = actualRhs;
}
typedef const_blas_data_mapper<LhsScalar,Index,RowMajor> LhsMapper;
typedef const_blas_data_mapper<RhsScalar,Index,ColMajor> RhsMapper;
general_matrix_vector_product
<Index,LhsScalar,LhsMapper,RowMajor,LhsBlasTraits::NeedToConjugate,RhsScalar,RhsMapper,RhsBlasTraits::NeedToConjugate>::run(
actualLhs.rows(), actualLhs.cols(),
LhsMapper(actualLhs.data(), actualLhs.outerStride()),
RhsMapper(actualRhsPtr, 1),
dest.data(), dest.col(0).innerStride(), //NOTE if dest is not a vector at compile-time, then dest.innerStride() might be wrong. (bug 1166)
actualAlpha);
typedef const_blas_data_mapper<LhsScalar, Index, RowMajor> LhsMapper;
typedef const_blas_data_mapper<RhsScalar, Index, ColMajor> RhsMapper;
general_matrix_vector_product<Index, LhsScalar, LhsMapper, RowMajor, LhsBlasTraits::NeedToConjugate, RhsScalar,
RhsMapper, RhsBlasTraits::NeedToConjugate>::
run(actualLhs.rows(), actualLhs.cols(), LhsMapper(actualLhs.data(), actualLhs.outerStride()),
RhsMapper(actualRhsPtr, 1), dest.data(),
dest.col(0).innerStride(), // NOTE if dest is not a vector at compile-time, then dest.innerStride() might
// be wrong. (bug 1166)
actualAlpha);
}
};
template<> struct gemv_dense_selector<OnTheRight,ColMajor,false>
{
template<typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha)
{
// TODO if rhs is large enough it might be beneficial to make sure that dest is sequentially stored in memory, otherwise use a temp
typename nested_eval<Rhs,1>::type actual_rhs(rhs);
template <>
struct gemv_dense_selector<OnTheRight, ColMajor, false> {
template <typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs& lhs, const Rhs& rhs, Dest& dest, const typename Dest::Scalar& alpha) {
EIGEN_STATIC_ASSERT((!nested_eval<Lhs, 1>::Evaluate),
EIGEN_INTERNAL_COMPILATION_ERROR_OR_YOU_MADE_A_PROGRAMMING_MISTAKE);
// TODO: if rhs is large enough it might be beneficial to make sure that dest is sequentially stored in memory,
// otherwise use a temp
typename nested_eval<Rhs, 1>::type actual_rhs(rhs);
const Index size = rhs.rows();
for(Index k=0; k<size; ++k)
dest += (alpha*actual_rhs.coeff(k)) * lhs.col(k);
for (Index k = 0; k < size; ++k) dest += (alpha * actual_rhs.coeff(k)) * lhs.col(k);
}
};
template<> struct gemv_dense_selector<OnTheRight,RowMajor,false>
{
template<typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs &lhs, const Rhs &rhs, Dest& dest, const typename Dest::Scalar& alpha)
{
typename nested_eval<Rhs,Lhs::RowsAtCompileTime>::type actual_rhs(rhs);
template <>
struct gemv_dense_selector<OnTheRight, RowMajor, false> {
template <typename Lhs, typename Rhs, typename Dest>
static void run(const Lhs& lhs, const Rhs& rhs, Dest& dest, const typename Dest::Scalar& alpha) {
EIGEN_STATIC_ASSERT((!nested_eval<Lhs, 1>::Evaluate),
EIGEN_INTERNAL_COMPILATION_ERROR_OR_YOU_MADE_A_PROGRAMMING_MISTAKE);
typename nested_eval<Rhs, Lhs::RowsAtCompileTime>::type actual_rhs(rhs);
const Index rows = dest.rows();
for(Index i=0; i<rows; ++i)
for (Index i = 0; i < rows; ++i)
dest.coeffRef(i) += alpha * (lhs.row(i).cwiseProduct(actual_rhs.transpose())).sum();
}
};
} // end namespace internal
} // end namespace internal
/***************************************************************************
* Implementation of matrix base methods
***************************************************************************/
* Implementation of matrix base methods
***************************************************************************/
/** \returns the matrix product of \c *this and \a other.
*
* \note If instead of the matrix product you want the coefficient-wise product, see Cwise::operator*().
*
* \sa lazyProduct(), operator*=(const MatrixBase&), Cwise::operator*()
*/
#ifndef __CUDACC__
template<typename Derived>
template<typename OtherDerived>
inline const Product<Derived, OtherDerived>
MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
{
*
* \note If instead of the matrix product you want the coefficient-wise product, see Cwise::operator*().
*
* \sa lazyProduct(), operator*=(const MatrixBase&), Cwise::operator*()
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Product<Derived, OtherDerived> MatrixBase<Derived>::operator*(
const MatrixBase<OtherDerived>& other) const {
// A note regarding the function declaration: In MSVC, this function will sometimes
// not be inlined since DenseStorage is an unwindable object for dynamic
// matrices and product types are holding a member to store the result.
// Thus it does not help tagging this function with EIGEN_STRONG_INLINE.
enum {
ProductIsValid = Derived::ColsAtCompileTime==Dynamic
|| OtherDerived::RowsAtCompileTime==Dynamic
|| int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime),
ProductIsValid = Derived::ColsAtCompileTime == Dynamic || OtherDerived::RowsAtCompileTime == Dynamic ||
int(Derived::ColsAtCompileTime) == int(OtherDerived::RowsAtCompileTime),
AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime,
SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived)
SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived, OtherDerived)
};
// note to the lost user:
// * for a dot product use: v1.dot(v2)
// * for a coeff-wise product use: v1.cwiseProduct(v2)
EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
EIGEN_STATIC_ASSERT(
ProductIsValid || !(AreVectors && SameSizes),
INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
#ifdef EIGEN_DEBUG_PRODUCT
internal::product_type<Derived,OtherDerived>::debug();
internal::product_type<Derived, OtherDerived>::debug();
#endif
return Product<Derived, OtherDerived>(derived(), other.derived());
}
#endif // __CUDACC__
/** \returns an expression of the matrix product of \c *this and \a other without implicit evaluation.
*
* The returned product will behave like any other expressions: the coefficients of the product will be
* computed once at a time as requested. This might be useful in some extremely rare cases when only
* a small and no coherent fraction of the result's coefficients have to be computed.
*
* \warning This version of the matrix product can be much much slower. So use it only if you know
* what you are doing and that you measured a true speed improvement.
*
* \sa operator*(const MatrixBase&)
*/
template<typename Derived>
template<typename OtherDerived>
const Product<Derived,OtherDerived,LazyProduct>
MatrixBase<Derived>::lazyProduct(const MatrixBase<OtherDerived> &other) const
{
*
* The returned product will behave like any other expressions: the coefficients of the product will be
* computed once at a time as requested. This might be useful in some extremely rare cases when only
* a small and no coherent fraction of the result's coefficients have to be computed.
*
* \warning This version of the matrix product can be much much slower. So use it only if you know
* what you are doing and that you measured a true speed improvement.
*
* \sa operator*(const MatrixBase&)
*/
template <typename Derived>
template <typename OtherDerived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Product<Derived, OtherDerived, LazyProduct>
MatrixBase<Derived>::lazyProduct(const MatrixBase<OtherDerived>& other) const {
enum {
ProductIsValid = Derived::ColsAtCompileTime==Dynamic
|| OtherDerived::RowsAtCompileTime==Dynamic
|| int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime),
ProductIsValid = Derived::ColsAtCompileTime == Dynamic || OtherDerived::RowsAtCompileTime == Dynamic ||
int(Derived::ColsAtCompileTime) == int(OtherDerived::RowsAtCompileTime),
AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime,
SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived)
SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived, OtherDerived)
};
// note to the lost user:
// * for a dot product use: v1.dot(v2)
// * for a coeff-wise product use: v1.cwiseProduct(v2)
EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
EIGEN_STATIC_ASSERT(
ProductIsValid || !(AreVectors && SameSizes),
INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
return Product<Derived,OtherDerived,LazyProduct>(derived(), other.derived());
return Product<Derived, OtherDerived, LazyProduct>(derived(), other.derived());
}
} // end namespace Eigen
} // end namespace Eigen
#endif // EIGEN_PRODUCT_H
#endif // EIGEN_PRODUCT_H

File diff suppressed because it is too large Load Diff

View File

@@ -13,175 +13,218 @@
#ifdef EIGEN_PARSED_BY_DOXYGEN
#define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME,FUNCTOR,DOC_OP,DOC_DETAILS) \
/** \returns an expression of the coefficient-wise DOC_OP of \a x
DOC_DETAILS
\sa <a href="group__CoeffwiseMathFunctions.html#cwisetable_##NAME">Math functions</a>, class CwiseUnaryOp
*/ \
template<typename Derived> \
inline const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> \
NAME(const Eigen::ArrayBase<Derived>& x);
#define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME, FUNCTOR, DOC_OP, DOC_DETAILS) \
/** \returns an expression of the coefficient-wise DOC_OP of \a x \
\ \
DOC_DETAILS \
\ \
\sa <a href="group__CoeffwiseMathFunctions.html#cwisetable_##NAME">Math functions</a>, class CwiseUnaryOp \
*/ \
template <typename Derived> \
inline const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> NAME( \
const Eigen::ArrayBase<Derived>& x);
#else
#define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME,FUNCTOR,DOC_OP,DOC_DETAILS) \
template<typename Derived> \
inline const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> \
(NAME)(const Eigen::ArrayBase<Derived>& x) { \
#define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME, FUNCTOR, DOC_OP, DOC_DETAILS) \
template <typename Derived> \
inline const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived>(NAME)( \
const Eigen::ArrayBase<Derived>& x) { \
return Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived>(x.derived()); \
}
#endif // EIGEN_PARSED_BY_DOXYGEN
#endif // EIGEN_PARSED_BY_DOXYGEN
#define EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(NAME,FUNCTOR) \
\
template<typename Derived> \
struct NAME##_retval<ArrayBase<Derived> > \
{ \
#define EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(NAME, FUNCTOR) \
\
template <typename Derived> \
struct NAME##_retval<ArrayBase<Derived> > { \
typedef const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> type; \
}; \
template<typename Derived> \
struct NAME##_impl<ArrayBase<Derived> > \
{ \
static inline typename NAME##_retval<ArrayBase<Derived> >::type run(const Eigen::ArrayBase<Derived>& x) \
{ \
return typename NAME##_retval<ArrayBase<Derived> >::type(x.derived()); \
} \
}; \
template <typename Derived> \
struct NAME##_impl<ArrayBase<Derived> > { \
static inline typename NAME##_retval<ArrayBase<Derived> >::type run(const Eigen::ArrayBase<Derived>& x) { \
return typename NAME##_retval<ArrayBase<Derived> >::type(x.derived()); \
} \
};
namespace Eigen
{
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(real,scalar_real_op,real part,\sa ArrayBase::real)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(imag,scalar_imag_op,imaginary part,\sa ArrayBase::imag)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(conj,scalar_conjugate_op,complex conjugate,\sa ArrayBase::conjugate)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(inverse,scalar_inverse_op,inverse,\sa ArrayBase::inverse)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sin,scalar_sin_op,sine,\sa ArrayBase::sin)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cos,scalar_cos_op,cosine,\sa ArrayBase::cos)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tan,scalar_tan_op,tangent,\sa ArrayBase::tan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atan,scalar_atan_op,arc-tangent,\sa ArrayBase::atan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asin,scalar_asin_op,arc-sine,\sa ArrayBase::asin)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acos,scalar_acos_op,arc-consine,\sa ArrayBase::acos)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sinh,scalar_sinh_op,hyperbolic sine,\sa ArrayBase::sinh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cosh,scalar_cosh_op,hyperbolic cosine,\sa ArrayBase::cosh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tanh,scalar_tanh_op,hyperbolic tangent,\sa ArrayBase::tanh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(lgamma,scalar_lgamma_op,natural logarithm of the gamma function,\sa ArrayBase::lgamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(digamma,scalar_digamma_op,derivative of lgamma,\sa ArrayBase::digamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erf,scalar_erf_op,error function,\sa ArrayBase::erf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erfc,scalar_erfc_op,complement error function,\sa ArrayBase::erfc)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp,scalar_exp_op,exponential,\sa ArrayBase::exp)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op,natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p,scalar_log1p_op,natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10,scalar_log10_op,base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op,absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2,scalar_abs2_op,squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg,scalar_arg_op,complex argument,\sa ArrayBase::arg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op,square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt,scalar_rsqrt_op,reciprocal square root,\sa ArrayBase::rsqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square,scalar_square_op,square (power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube,scalar_cube_op,cube (power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round,scalar_round_op,nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(floor,scalar_floor_op,nearest integer not greater than the giben value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ceil,scalar_ceil_op,nearest integer not less than the giben value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isnan,scalar_isnan_op,not-a-number test,\sa Eigen::isinf DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isnan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isinf,scalar_isinf_op,infinite value test,\sa Eigen::isnan DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isinf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isfinite,scalar_isfinite_op,finite value test,\sa Eigen::isinf DOXCOMMA Eigen::isnan DOXCOMMA ArrayBase::isfinite)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sign,scalar_sign_op,sign (or 0),\sa ArrayBase::sign)
/** \returns an expression of the coefficient-wise power of \a x to the given constant \a exponent.
*
* \tparam ScalarExponent is the scalar type of \a exponent. It must be compatible with the scalar type of the given expression (\c Derived::Scalar).
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
#ifdef EIGEN_PARSED_BY_DOXYGEN
template<typename Derived,typename ScalarExponent>
inline const CwiseBinaryOp<internal::scalar_pow_op<Derived::Scalar,ScalarExponent>,Derived,Constant<ScalarExponent> >
pow(const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent);
#else
template<typename Derived,typename ScalarExponent>
inline typename internal::enable_if< !(internal::is_same<typename Derived::Scalar,ScalarExponent>::value) && EIGEN_SCALAR_BINARY_SUPPORTED(pow,typename Derived::Scalar,ScalarExponent),
const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,ScalarExponent,pow) >::type
pow(const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent) {
return x.derived().pow(exponent);
}
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
template<typename Derived>
inline const EIGEN_EXPR_BINARYOP_SCALAR_RETURN_TYPE(Derived,typename Derived::Scalar,pow)
pow(const Eigen::ArrayBase<Derived>& x, const typename Derived::Scalar& exponent) {
return x.derived().pow(exponent);
}
namespace Eigen {
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(real, scalar_real_op, real part,\sa ArrayBase::real)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(imag, scalar_imag_op, imaginary part,\sa ArrayBase::imag)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(conj, scalar_conjugate_op, complex conjugate,\sa ArrayBase::conjugate)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(inverse, scalar_inverse_op, inverse,\sa ArrayBase::inverse)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sin, scalar_sin_op, sine,\sa ArrayBase::sin)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cos, scalar_cos_op, cosine,\sa ArrayBase::cos)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tan, scalar_tan_op, tangent,\sa ArrayBase::tan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atan, scalar_atan_op, arc - tangent,\sa ArrayBase::atan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asin, scalar_asin_op, arc - sine,\sa ArrayBase::asin)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acos, scalar_acos_op, arc - consine,\sa ArrayBase::acos)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sinh, scalar_sinh_op, hyperbolic sine,\sa ArrayBase::sinh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cosh, scalar_cosh_op, hyperbolic cosine,\sa ArrayBase::cosh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tanh, scalar_tanh_op, hyperbolic tangent,\sa ArrayBase::tanh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asinh, scalar_asinh_op, inverse hyperbolic sine,\sa ArrayBase::asinh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acosh, scalar_acosh_op, inverse hyperbolic cosine,\sa ArrayBase::acosh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atanh, scalar_atanh_op, inverse hyperbolic tangent,\sa ArrayBase::atanh)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(logistic, scalar_logistic_op, logistic function,\sa ArrayBase::logistic)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(lgamma, scalar_lgamma_op,
natural logarithm of the gamma function,\sa ArrayBase::lgamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(digamma, scalar_digamma_op, derivative of lgamma,\sa ArrayBase::digamma)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erf, scalar_erf_op, error function,\sa ArrayBase::erf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(erfc, scalar_erfc_op, complement error function,\sa ArrayBase::erfc)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(ndtri, scalar_ndtri_op, inverse normal distribution function,\sa ArrayBase::ndtri)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp, scalar_exp_op, exponential,\sa ArrayBase::exp)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp2, scalar_exp2_op, exponential,\sa ArrayBase::exp2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(expm1, scalar_expm1_op, exponential of a value minus 1,\sa ArrayBase::expm1)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log, scalar_log_op, natural logarithm,\sa Eigen::log10 DOXCOMMA ArrayBase::log)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log1p, scalar_log1p_op, natural logarithm of 1 plus the value,\sa ArrayBase::log1p)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log10, scalar_log10_op, base 10 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log10)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log2, scalar_log2_op, base 2 logarithm,\sa Eigen::log DOXCOMMA ArrayBase::log2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs, scalar_abs_op, absolute value,\sa ArrayBase::abs DOXCOMMA MatrixBase::cwiseAbs)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs2, scalar_abs2_op,
squared absolute value,\sa ArrayBase::abs2 DOXCOMMA MatrixBase::cwiseAbs2)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(arg, scalar_arg_op, complex argument,\sa ArrayBase::arg DOXCOMMA MatrixBase::cwiseArg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(carg, scalar_carg_op,
complex argument, \sa ArrayBase::carg DOXCOMMA MatrixBase::cwiseCArg)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt, scalar_sqrt_op, square root,\sa ArrayBase::sqrt DOXCOMMA MatrixBase::cwiseSqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cbrt, scalar_cbrt_op, cube root,\sa ArrayBase::cbrt DOXCOMMA MatrixBase::cwiseCbrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rsqrt, scalar_rsqrt_op, reciprocal square root,\sa ArrayBase::rsqrt)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(square, scalar_square_op,
square(power 2),\sa Eigen::abs2 DOXCOMMA Eigen::pow DOXCOMMA ArrayBase::square)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cube, scalar_cube_op, cube(power 3),\sa Eigen::pow DOXCOMMA ArrayBase::cube)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(rint, scalar_rint_op,
nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(round, scalar_round_op,
nearest integer,\sa Eigen::floor DOXCOMMA Eigen::ceil DOXCOMMA ArrayBase::round)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(
floor, scalar_floor_op, nearest integer not greater than the given value,\sa Eigen::ceil DOXCOMMA ArrayBase::floor)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(
ceil, scalar_ceil_op, nearest integer not less than the given value,\sa Eigen::floor DOXCOMMA ArrayBase::ceil)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(trunc, scalar_trunc_op,
nearest integer not greater in magnitude than the given value,\sa Eigen::trunc DOXCOMMA
ArrayBase::trunc)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(
isnan, scalar_isnan_op, not -a - number test,\sa Eigen::isinf DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isnan)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(
isinf, scalar_isinf_op, infinite value test,\sa Eigen::isnan DOXCOMMA Eigen::isfinite DOXCOMMA ArrayBase::isinf)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(isfinite, scalar_isfinite_op,
finite value test,\sa Eigen::isinf DOXCOMMA Eigen::isnan DOXCOMMA ArrayBase::isfinite)
EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sign, scalar_sign_op, sign(or 0),\sa ArrayBase::sign)
template <typename Derived, typename ScalarExponent>
using GlobalUnaryPowReturnType = std::enable_if_t<
!internal::is_arithmetic<typename NumTraits<Derived>::Real>::value &&
internal::is_arithmetic<typename NumTraits<ScalarExponent>::Real>::value,
CwiseUnaryOp<internal::scalar_unary_pow_op<typename Derived::Scalar, ScalarExponent>, const Derived> >;
/** \returns an expression of the coefficient-wise power of \a x to the given constant \a exponent.
*
* \tparam ScalarExponent is the scalar type of \a exponent. It must be compatible with the scalar type of the given
* expression (\c Derived::Scalar).
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
#ifdef EIGEN_PARSED_BY_DOXYGEN
template <typename Derived, typename ScalarExponent>
EIGEN_DEVICE_FUNC constexpr inline const GlobalUnaryPowReturnType<Derived, ScalarExponent> pow(
const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent);
#else
template <typename Derived, typename ScalarExponent>
EIGEN_DEVICE_FUNC constexpr inline const GlobalUnaryPowReturnType<Derived, ScalarExponent> pow(
const Eigen::ArrayBase<Derived>& x, const ScalarExponent& exponent) {
return GlobalUnaryPowReturnType<Derived, ScalarExponent>(
x.derived(), internal::scalar_unary_pow_op<typename Derived::Scalar, ScalarExponent>(exponent));
}
#endif
/** \returns an expression of the coefficient-wise power of \a x to the given array of \a exponents.
*
* This function computes the coefficient-wise power.
*
* Example: \include Cwise_array_power_array.cpp
* Output: \verbinclude Cwise_array_power_array.out
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
template<typename Derived,typename ExponentDerived>
inline const Eigen::CwiseBinaryOp<Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived, const ExponentDerived>
pow(const Eigen::ArrayBase<Derived>& x, const Eigen::ArrayBase<ExponentDerived>& exponents)
{
return Eigen::CwiseBinaryOp<Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived, const ExponentDerived>(
x.derived(),
exponents.derived()
);
}
/** \returns an expression of the coefficient-wise power of the scalar \a x to the given array of \a exponents.
*
* This function computes the coefficient-wise power between a scalar and an array of exponents.
*
* \tparam Scalar is the scalar type of \a x. It must be compatible with the scalar type of the given array expression (\c Derived::Scalar).
*
* Example: \include Cwise_scalar_power_array.cpp
* Output: \verbinclude Cwise_scalar_power_array.out
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
#ifdef EIGEN_PARSED_BY_DOXYGEN
template<typename Scalar,typename Derived>
inline const CwiseBinaryOp<internal::scalar_pow_op<Scalar,Derived::Scalar>,Constant<Scalar>,Derived>
pow(const Scalar& x,const Eigen::ArrayBase<Derived>& x);
#else
template<typename Scalar, typename Derived>
inline typename internal::enable_if< !(internal::is_same<typename Derived::Scalar,Scalar>::value) && EIGEN_SCALAR_BINARY_SUPPORTED(pow,Scalar,typename Derived::Scalar),
const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,Derived,pow) >::type
pow(const Scalar& x, const Eigen::ArrayBase<Derived>& exponents)
{
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(Scalar,Derived,pow)(
typename internal::plain_constant_type<Derived,Scalar>::type(exponents.rows(), exponents.cols(), x), exponents.derived() );
}
template<typename Derived>
inline const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename Derived::Scalar,Derived,pow)
pow(const typename Derived::Scalar& x, const Eigen::ArrayBase<Derived>& exponents)
{
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(typename Derived::Scalar,Derived,pow)(
typename internal::plain_constant_type<Derived,typename Derived::Scalar>::type(exponents.rows(), exponents.cols(), x), exponents.derived() );
}
#endif
namespace internal
{
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(real,scalar_real_op)
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(imag,scalar_imag_op)
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(abs2,scalar_abs2_op)
}
/** \returns an expression of the coefficient-wise power of \a x to the given array of \a exponents.
*
* This function computes the coefficient-wise power.
*
* Example: \include Cwise_array_power_array.cpp
* Output: \verbinclude Cwise_array_power_array.out
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
template <typename Derived, typename ExponentDerived>
inline const Eigen::CwiseBinaryOp<
Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived,
const ExponentDerived>
pow(const Eigen::ArrayBase<Derived>& x, const Eigen::ArrayBase<ExponentDerived>& exponents) {
return Eigen::CwiseBinaryOp<
Eigen::internal::scalar_pow_op<typename Derived::Scalar, typename ExponentDerived::Scalar>, const Derived,
const ExponentDerived>(x.derived(), exponents.derived());
}
// TODO: cleanly disable those functions that are not supported on Array (numext::real_ref, internal::random, internal::isApprox...)
/** \returns an expression of the coefficient-wise power of the scalar \a x to the given array of \a exponents.
*
* This function computes the coefficient-wise power between a scalar and an array of exponents.
*
* \tparam Scalar is the scalar type of \a x. It must be compatible with the scalar type of the given array expression
* (\c Derived::Scalar).
*
* Example: \include Cwise_scalar_power_array.cpp
* Output: \verbinclude Cwise_scalar_power_array.out
*
* \sa ArrayBase::pow()
*
* \relates ArrayBase
*/
#ifdef EIGEN_PARSED_BY_DOXYGEN
template <typename Scalar, typename Derived>
inline const CwiseBinaryOp<internal::scalar_pow_op<Scalar, Derived::Scalar>, Constant<Scalar>, Derived> pow(
const Scalar& x, const Eigen::ArrayBase<Derived>& x);
#else
template <typename Scalar, typename Derived>
EIGEN_DEVICE_FUNC inline const EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(
typename internal::promote_scalar_arg<typename Derived::Scalar EIGEN_COMMA Scalar EIGEN_COMMA
EIGEN_SCALAR_BINARY_SUPPORTED(pow, Scalar,
typename Derived::Scalar)>::type,
Derived, pow) pow(const Scalar& x, const Eigen::ArrayBase<Derived>& exponents) {
typedef
typename internal::promote_scalar_arg<typename Derived::Scalar, Scalar,
EIGEN_SCALAR_BINARY_SUPPORTED(pow, Scalar, typename Derived::Scalar)>::type
PromotedScalar;
return EIGEN_SCALAR_BINARYOP_EXPR_RETURN_TYPE(PromotedScalar, Derived, pow)(
typename internal::plain_constant_type<Derived, PromotedScalar>::type(
exponents.derived().rows(), exponents.derived().cols(), internal::scalar_constant_op<PromotedScalar>(x)),
exponents.derived());
}
#endif
#endif // EIGEN_GLOBAL_FUNCTIONS_H
/** \returns an expression of the coefficient-wise atan2(\a x, \a y). \a x and \a y must be of the same type.
*
* This function computes the coefficient-wise atan2().
*
* \sa ArrayBase::atan2()
*
* \relates ArrayBase
*/
template <typename LhsDerived, typename RhsDerived>
inline const std::enable_if_t<
std::is_same<typename LhsDerived::Scalar, typename RhsDerived::Scalar>::value,
Eigen::CwiseBinaryOp<Eigen::internal::scalar_atan2_op<typename LhsDerived::Scalar, typename RhsDerived::Scalar>,
const LhsDerived, const RhsDerived> >
atan2(const Eigen::ArrayBase<LhsDerived>& x, const Eigen::ArrayBase<RhsDerived>& exponents) {
return Eigen::CwiseBinaryOp<
Eigen::internal::scalar_atan2_op<typename LhsDerived::Scalar, typename RhsDerived::Scalar>, const LhsDerived,
const RhsDerived>(x.derived(), exponents.derived());
}
namespace internal {
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(real, scalar_real_op)
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(imag, scalar_imag_op)
EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(abs2, scalar_abs2_op)
} // namespace internal
} // namespace Eigen
// TODO: cleanly disable those functions that are not supported on Array (numext::real_ref, internal::random,
// internal::isApprox...)
#endif // EIGEN_GLOBAL_FUNCTIONS_H

View File

@@ -11,59 +11,65 @@
#ifndef EIGEN_IO_H
#define EIGEN_IO_H
namespace Eigen {
// IWYU pragma: private
#include "./InternalHeaderCheck.h"
namespace Eigen {
enum { DontAlignCols = 1 };
enum { StreamPrecision = -1,
FullPrecision = -2 };
enum { StreamPrecision = -1, FullPrecision = -2 };
namespace internal {
template<typename Derived>
std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt);
template <typename Derived>
std::ostream& print_matrix(std::ostream& s, const Derived& _m, const IOFormat& fmt);
}
/** \class IOFormat
* \ingroup Core_Module
*
* \brief Stores a set of parameters controlling the way matrices are printed
*
* List of available parameters:
* - \b precision number of digits for floating point values, or one of the special constants \c StreamPrecision and \c FullPrecision.
* The default is the special value \c StreamPrecision which means to use the
* stream's own precision setting, as set for instance using \c cout.precision(3). The other special value
* \c FullPrecision means that the number of digits will be computed to match the full precision of each floating-point
* type.
* - \b flags an OR-ed combination of flags, the default value is 0, the only currently available flag is \c DontAlignCols which
* allows to disable the alignment of columns, resulting in faster code.
* - \b coeffSeparator string printed between two coefficients of the same row
* - \b rowSeparator string printed between two rows
* - \b rowPrefix string printed at the beginning of each row
* - \b rowSuffix string printed at the end of each row
* - \b matPrefix string printed at the beginning of the matrix
* - \b matSuffix string printed at the end of the matrix
*
* Example: \include IOFormat.cpp
* Output: \verbinclude IOFormat.out
*
* \sa DenseBase::format(), class WithFormat
*/
struct IOFormat
{
* \ingroup Core_Module
*
* \brief Stores a set of parameters controlling the way matrices are printed
*
* List of available parameters:
* - \b precision number of digits for floating point values, or one of the special constants \c StreamPrecision and \c
* FullPrecision. The default is the special value \c StreamPrecision which means to use the stream's own precision
* setting, as set for instance using \c cout.precision(3). The other special value \c FullPrecision means that the
* number of digits will be computed to match the full precision of each floating-point type.
* - \b flags an OR-ed combination of flags, the default value is 0, the only currently available flag is \c
* DontAlignCols which allows to disable the alignment of columns, resulting in faster code.
* - \b coeffSeparator string printed between two coefficients of the same row
* - \b rowSeparator string printed between two rows
* - \b rowPrefix string printed at the beginning of each row
* - \b rowSuffix string printed at the end of each row
* - \b matPrefix string printed at the beginning of the matrix
* - \b matSuffix string printed at the end of the matrix
* - \b fill character printed to fill the empty space in aligned columns
*
* Example: \include IOFormat.cpp
* Output: \verbinclude IOFormat.out
*
* \sa DenseBase::format(), class WithFormat
*/
struct IOFormat {
/** Default constructor, see class IOFormat for the meaning of the parameters */
IOFormat(int _precision = StreamPrecision, int _flags = 0,
const std::string& _coeffSeparator = " ",
const std::string& _rowSeparator = "\n", const std::string& _rowPrefix="", const std::string& _rowSuffix="",
const std::string& _matPrefix="", const std::string& _matSuffix="")
: matPrefix(_matPrefix), matSuffix(_matSuffix), rowPrefix(_rowPrefix), rowSuffix(_rowSuffix), rowSeparator(_rowSeparator),
rowSpacer(""), coeffSeparator(_coeffSeparator), precision(_precision), flags(_flags)
{
// TODO check if rowPrefix, rowSuffix or rowSeparator contains a newline
IOFormat(int _precision = StreamPrecision, int _flags = 0, const std::string& _coeffSeparator = " ",
const std::string& _rowSeparator = "\n", const std::string& _rowPrefix = "",
const std::string& _rowSuffix = "", const std::string& _matPrefix = "", const std::string& _matSuffix = "",
const char _fill = ' ')
: matPrefix(_matPrefix),
matSuffix(_matSuffix),
rowPrefix(_rowPrefix),
rowSuffix(_rowSuffix),
rowSeparator(_rowSeparator),
rowSpacer(""),
coeffSeparator(_coeffSeparator),
fill(_fill),
precision(_precision),
flags(_flags) {
// TODO: check if rowPrefix, rowSuffix or rowSeparator contains a newline
// don't add rowSpacer if columns are not to be aligned
if((flags & DontAlignCols))
return;
int i = int(matSuffix.length())-1;
while (i>=0 && matSuffix[i]!='\n')
{
if ((flags & DontAlignCols)) return;
int i = int(matPrefix.length()) - 1;
while (i >= 0 && matPrefix[i] != '\n') {
rowSpacer += ' ';
i--;
}
@@ -71,169 +77,157 @@ struct IOFormat
std::string matPrefix, matSuffix;
std::string rowPrefix, rowSuffix, rowSeparator, rowSpacer;
std::string coeffSeparator;
char fill;
int precision;
int flags;
};
/** \class WithFormat
* \ingroup Core_Module
*
* \brief Pseudo expression providing matrix output with given format
*
* \tparam ExpressionType the type of the object on which IO stream operations are performed
*
* This class represents an expression with stream operators controlled by a given IOFormat.
* It is the return type of DenseBase::format()
* and most of the time this is the only way it is used.
*
* See class IOFormat for some examples.
*
* \sa DenseBase::format(), class IOFormat
*/
template<typename ExpressionType>
class WithFormat
{
public:
* \ingroup Core_Module
*
* \brief Pseudo expression providing matrix output with given format
*
* \tparam ExpressionType the type of the object on which IO stream operations are performed
*
* This class represents an expression with stream operators controlled by a given IOFormat.
* It is the return type of DenseBase::format()
* and most of the time this is the only way it is used.
*
* See class IOFormat for some examples.
*
* \sa DenseBase::format(), class IOFormat
*/
template <typename ExpressionType>
class WithFormat {
public:
WithFormat(const ExpressionType& matrix, const IOFormat& format) : m_matrix(matrix), m_format(format) {}
WithFormat(const ExpressionType& matrix, const IOFormat& format)
: m_matrix(matrix), m_format(format)
{}
friend std::ostream& operator<<(std::ostream& s, const WithFormat& wf) {
return internal::print_matrix(s, wf.m_matrix.eval(), wf.m_format);
}
friend std::ostream & operator << (std::ostream & s, const WithFormat& wf)
{
return internal::print_matrix(s, wf.m_matrix.eval(), wf.m_format);
}
protected:
const typename ExpressionType::Nested m_matrix;
IOFormat m_format;
protected:
typename ExpressionType::Nested m_matrix;
IOFormat m_format;
};
/** \returns a WithFormat proxy object allowing to print a matrix the with given
* format \a fmt.
*
* See class IOFormat for some examples.
*
* \sa class IOFormat, class WithFormat
*/
template<typename Derived>
inline const WithFormat<Derived>
DenseBase<Derived>::format(const IOFormat& fmt) const
{
return WithFormat<Derived>(derived(), fmt);
}
namespace internal {
// NOTE: This helper is kept for backward compatibility with previous code specializing
// this internal::significant_decimals_impl structure. In the future we should directly
// call digits10() which has been introduced in July 2016 in 3.3.
template<typename Scalar>
struct significant_decimals_impl
{
static inline int run()
{
return NumTraits<Scalar>::digits10();
}
// call max_digits10().
template <typename Scalar>
struct significant_decimals_impl {
static inline int run() { return NumTraits<Scalar>::max_digits10(); }
};
/** \internal
* print the matrix \a _m to the output stream \a s using the output format \a fmt */
template<typename Derived>
std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt)
{
if(_m.size() == 0)
{
* print the matrix \a _m to the output stream \a s using the output format \a fmt */
template <typename Derived>
std::ostream& print_matrix(std::ostream& s, const Derived& _m, const IOFormat& fmt) {
using internal::is_same;
if (_m.size() == 0) {
s << fmt.matPrefix << fmt.matSuffix;
return s;
}
typename Derived::Nested m = _m;
typedef typename Derived::Scalar Scalar;
typedef std::conditional_t<is_same<Scalar, char>::value || is_same<Scalar, unsigned char>::value ||
is_same<Scalar, numext::int8_t>::value || is_same<Scalar, numext::uint8_t>::value,
int,
std::conditional_t<is_same<Scalar, std::complex<char> >::value ||
is_same<Scalar, std::complex<unsigned char> >::value ||
is_same<Scalar, std::complex<numext::int8_t> >::value ||
is_same<Scalar, std::complex<numext::uint8_t> >::value,
std::complex<int>, const Scalar&> >
PrintType;
Index width = 0;
std::streamsize explicit_precision;
if(fmt.precision == StreamPrecision)
{
if (fmt.precision == StreamPrecision) {
explicit_precision = 0;
}
else if(fmt.precision == FullPrecision)
{
if (NumTraits<Scalar>::IsInteger)
{
} else if (fmt.precision == FullPrecision) {
if (NumTraits<Scalar>::IsInteger) {
explicit_precision = 0;
}
else
{
} else {
explicit_precision = significant_decimals_impl<Scalar>::run();
}
}
else
{
} else {
explicit_precision = fmt.precision;
}
std::streamsize old_precision = 0;
if(explicit_precision) old_precision = s.precision(explicit_precision);
if (explicit_precision) old_precision = s.precision(explicit_precision);
bool align_cols = !(fmt.flags & DontAlignCols);
if(align_cols)
{
if (align_cols) {
// compute the largest width
for(Index j = 0; j < m.cols(); ++j)
for(Index i = 0; i < m.rows(); ++i)
{
for (Index j = 0; j < m.cols(); ++j)
for (Index i = 0; i < m.rows(); ++i) {
std::stringstream sstr;
sstr.copyfmt(s);
sstr << m.coeff(i,j);
sstr << static_cast<PrintType>(m.coeff(i, j));
width = std::max<Index>(width, Index(sstr.str().length()));
}
}
std::streamsize old_width = s.width();
char old_fill_character = s.fill();
s << fmt.matPrefix;
for(Index i = 0; i < m.rows(); ++i)
{
if (i)
s << fmt.rowSpacer;
for (Index i = 0; i < m.rows(); ++i) {
if (i) s << fmt.rowSpacer;
s << fmt.rowPrefix;
if(width) s.width(width);
s << m.coeff(i, 0);
for(Index j = 1; j < m.cols(); ++j)
{
if (width) {
s.fill(fmt.fill);
s.width(width);
}
s << static_cast<PrintType>(m.coeff(i, 0));
for (Index j = 1; j < m.cols(); ++j) {
s << fmt.coeffSeparator;
if (width) s.width(width);
s << m.coeff(i, j);
if (width) {
s.fill(fmt.fill);
s.width(width);
}
s << static_cast<PrintType>(m.coeff(i, j));
}
s << fmt.rowSuffix;
if( i < m.rows() - 1)
s << fmt.rowSeparator;
if (i < m.rows() - 1) s << fmt.rowSeparator;
}
s << fmt.matSuffix;
if(explicit_precision) s.precision(old_precision);
if (explicit_precision) s.precision(old_precision);
if (width) {
s.fill(old_fill_character);
s.width(old_width);
}
return s;
}
} // end namespace internal
} // end namespace internal
/** \relates DenseBase
*
* Outputs the matrix, to the given stream.
*
* If you wish to print the matrix with a format different than the default, use DenseBase::format().
*
* It is also possible to change the default format by defining EIGEN_DEFAULT_IO_FORMAT before including Eigen headers.
* If not defined, this will automatically be defined to Eigen::IOFormat(), that is the Eigen::IOFormat with default parameters.
*
* \sa DenseBase::format()
*/
template<typename Derived>
std::ostream & operator <<
(std::ostream & s,
const DenseBase<Derived> & m)
{
*
* Outputs the matrix, to the given stream.
*
* If you wish to print the matrix with a format different than the default, use DenseBase::format().
*
* It is also possible to change the default format by defining EIGEN_DEFAULT_IO_FORMAT before including Eigen headers.
* If not defined, this will automatically be defined to Eigen::IOFormat(), that is the Eigen::IOFormat with default
* parameters.
*
* \sa DenseBase::format()
*/
template <typename Derived>
std::ostream& operator<<(std::ostream& s, const DenseBase<Derived>& m) {
return internal::print_matrix(s, m.eval(), EIGEN_DEFAULT_IO_FORMAT);
}
} // end namespace Eigen
template <typename Derived>
std::ostream& operator<<(std::ostream& s, const DiagonalBase<Derived>& m) {
return internal::print_matrix(s, m.derived(), EIGEN_DEFAULT_IO_FORMAT);
}
#endif // EIGEN_IO_H
} // end namespace Eigen
#endif // EIGEN_IO_H

Some files were not shown because too many files have changed in this diff Show More