Commit Graph

7730 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
014f12f11a GPU: Add BLAS-1 ops, DeviceScalar, device-resident SpMV, and CG interop (5/5)
Add the operator interface needed for GPU iterative solvers:

- BLAS Level-1 on DeviceMatrix: dot(), norm(), squaredNorm(), setZero(),
  noalias(), operator+=/-=/\*= dispatching to cuBLAS axpy/scal/dot/nrm2.
- DeviceScalar<Scalar>: device-resident scalar returned by reductions.
  Defers host sync until value is read (implicit conversion). Device-side
  division via NPP for real types.
- GpuContext: stream-borrowing constructor, setThreadLocal(), cublasLtHandle(),
  cusparseHandle().
- GEMM upgraded from cublasGemmEx to cublasLtMatmul with heuristic algorithm
  selection and plan caching.
- GpuSparseContext: GpuContext& constructor for same-stream execution,
  deviceView() returning DeviceSparseView with operator* for device-resident
  SpMV (d_y = d_A * d_x).
- geam expressions: d_C = d_A + alpha * d_B via cublasXgeam.
- GpuSVD::matrixV() convenience wrapper.

These additions make DeviceMatrix usable as a VectorType in Eigen algorithm
templates. Conjugate gradient is the motivating example and is tested against
CPU ConjugateGradient for correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:19:59 -07:00
Rasmus Munk Larsen
43a95b62bb GPU: Add sparse solvers, FFT, and SpMV (cuDSS, cuFFT, cuSPARSE)
Add GPU sparse direct solvers (Cholesky, LDL^T, LU) via cuDSS, 1D/2D FFT
via cuFFT with plan caching, and sparse matrix-vector/matrix multiply
(SpMV/SpMM) via cuSPARSE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:11:49 -07:00
Rasmus Munk Larsen
8593c7f5a1 GPU: Add dense cuSOLVER solvers (QR, SVD, EigenSolver)
Add QR (geqrf + ormqr + trsm), SVD (gesvd), and self-adjoint eigenvalue
decomposition (syevd) via cuSOLVER. All support host and DeviceMatrix input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:11:34 -07:00
Rasmus Munk Larsen
58c44ef36d GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER)
Add Eigen/GPU module: A standalone GPU library dispatch layer where
DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls.
CPU and GPU solvers coexist in the same binary with compatible syntax.

Core infrastructure:
- DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with
  async host transfer (fromHost/toHost) and CUDA event-based cross-stream
  synchronization.
- GpuContext: Unified execution context owning a CUDA stream + cuBLAS
  handle + cuSOLVER handle. Thread-local default with explicit override
  via setThreadLocal(). Stream-borrowing constructor for integration.
- DeviceBuffer: Typed RAII device allocation with move semantics.

cuBLAS dispatch (expression syntax):
- GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm)
- TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm)
- SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm)
- SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk)

cuSOLVER dispatch:
- GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs)
- GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs)
- Solver chaining: auto x = d_A.llt().solve(d_B)
- Solver expressions with .device(ctx) for explicit stream control.

CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11,
Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit.

Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp,
gpu_device_matrix.cpp, gpu_library_example.cu
Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp,
bench_gpu_batching.cpp
2026-04-09 19:05:25 -07:00
Rasmus Munk Larsen
6a9405bf7a GPU: Raise CUDA/HIP minimum and remove legacy guards
- Raise CUDA minimum from 9.0 to 11.4 (sm_70/Volta).
- Raise HIP minimum to GFX906 (Vega 20/MI50) / ROCm 5.6.
- Remove EIGEN_HAS_{CUDA,HIP,GPU}_FP16 guards — FP16 is always available
  on sm_70+ and GFX906+.
- Remove obsolete __HIP_ARCH_HAS_* preprocessor branches.
- C++14 cleanup: remove pre-C++14 workarounds in GPU code.
- Fix NVCC warnings (deprecated register keyword, unreachable code,
  tautological comparisons).
- Fix HIP test execution on gfx1151.
- Update CI configuration for new minimum versions.
2026-04-09 15:21:39 -07:00
Rasmus Munk Larsen
e055e4e415 Add plog_core_double with fallback for AVX without AVX2
libeigen/eigen!2407

Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com>
2026-04-08 19:41:07 -07:00
Rasmus Munk Larsen
b1d2ce4c85 Revert "Speed up plog_double ~1.7x with fast integer range reduction"
This reverts merge request !2385
2026-04-08 13:03:48 -07:00
Rasmus Munk Larsen
ab70739c9c Speed up plog_double ~1.7x with fast integer range reduction
libeigen/eigen!2385

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:48:25 -07:00
Rasmus Munk Larsen
def45c5e1e Improve psincos_double: faster polynomials + accurate range reduction
libeigen/eigen!2389

Closes #3052

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:24:24 -07:00
Rasmus Munk Larsen
110530a4d8 Fix bugs and improve robustness of SelfAdjointEigenSolver, improve test coverage
libeigen/eigen!2396

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-07 21:08:29 -07:00
Florian Maurin
b57d860f3e Fix GCC maybe-uninitialized warning in InnerProduct
libeigen/eigen!2386

Closes #3015
2026-04-03 19:41:09 -07:00
Rasmus Munk Larsen
a3074053a6 Speed up pexp_double by ~15-17%
libeigen/eigen!2388

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-03 17:09:11 -07:00
Rasmus Munk Larsen
a91913e961 Speed up plog_float by 1.6x with improved accuracy
libeigen/eigen!2382

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-03 13:45:01 -07:00
Pavel Guzenfeld
e315a8cdd0 Inline IndexedViewMethods.inc into DenseBase.h
libeigen/eigen!2330

Closes #2766
2026-04-02 15:26:56 -07:00
Rasmus Munk Larsen
61a8662876 Improve log1p accuracy and speed with direct range reduction
libeigen/eigen!2378

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-02 11:29:25 -07:00
Rasmus Munk Larsen
d31a73437f Vectorize asinh and acosh for float and double
libeigen/eigen!2376

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 21:46:36 -07:00
Rasmus Munk Larsen
9513d3878e Vectorize sinh, cosh, and log10
libeigen/eigen!2368

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 20:41:18 -07:00
Rasmus Munk Larsen
64885cc6a3 Fix remaining MSVC warnings in Windows CI (C4804, C4244, C4146, C4305)
libeigen/eigen!2374

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-04-01 17:20:31 -07:00
Rasmus Munk Larsen
b54640df19 Fix NVHPC warnings in Visitor.h and Memory.h
libeigen/eigen!2370

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 15:09:37 -07:00
Rasmus Munk Larsen
7fcbed7acb Fill packet math coverage gaps across multiple architectures
libeigen/eigen!2237

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-31 14:52:32 -07:00
Rasmus Munk Larsen
1ade3636b9 Fix BDCSVD bidiagonal hard-case failures on ARM with GCC
libeigen/eigen!2365

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-30 20:17:37 -07:00
Rasmus Munk Larsen
801a9ee690 Fix ~1,460 MSVC warnings from generic code instantiated with bool
libeigen/eigen!2364

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 21:05:49 -07:00
Rasmus Munk Larsen
732ebc8cc2 Modernize evaluator files
libeigen/eigen!2245

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 17:40:39 -07:00
Rasmus Munk Larsen
c8633ceeea Clean up top-level Eigen headers
libeigen/eigen!2252

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 16:28:09 -07:00
Pavel Guzenfeld
753a6ac5b3 Fix private shadowing of protected base members in iterative solvers
libeigen/eigen!2357

Closes #1859
2026-03-29 15:40:48 -07:00
Rasmus Munk Larsen
9fe2f03fa4 Revert "Lower BDCSVD crossover threshold from 16 to 8"
This reverts merge request !2358
2026-03-29 15:25:09 -07:00
Rasmus Munk Larsen
12fe90db8b Lower BDCSVD crossover threshold from 16 to 8
libeigen/eigen!2358

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-29 14:33:22 -07:00
Pavel Guzenfeld
b7f6aed1b9 Fix dangling reference in IndexedView with expression indices
libeigen/eigen!2335

Closes #1943
2026-03-29 09:39:13 -07:00
Rasmus Munk Larsen
624ab58e8d Add bidiagonal SVD API to BDCSVD and remove dead debug code
libeigen/eigen!2238

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 20:38:31 -07:00
Rasmus Munk Larsen
0fe8cdfa3b Extract RankRevealingBase CRTP mixin to eliminate decomposition code duplication
libeigen/eigen!2272

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 19:12:23 -07:00
Rasmus Munk Larsen
f928a9f534 Fix static alignment for generic clang vector backend
libeigen/eigen!2351

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 15:50:58 -07:00
Pavel Guzenfeld
90ca5bfd9a Strip lapacke.h to only the declarations used by Eigen
libeigen/eigen!2322

Closes #2851
2026-03-27 20:16:46 -07:00
Rasmus Munk Larsen
cf508c096b Add block Householder right-side application for HouseholderSequence
libeigen/eigen!2342

Closes #3057

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-27 19:56:08 -07:00
Rasmus Munk Larsen
79d7d280a5 Fix bugs in evaluator files
libeigen/eigen!2244

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-28 01:25:51 +00:00
Charles Schlosser
eb4b2eeffa UBSAN: use appropriate SSE intrinsics for loading 4 and 8 bytes
libeigen/eigen!2346
2026-03-27 19:54:10 +00:00
Tyler Veness
9939a4c6e3 Fix SparseLU and SparseQR for custom scalar types
libeigen/eigen!2345
2026-03-27 00:13:11 -07:00
Rasmus Munk Larsen
002229ce47 Fix RowMajor gemm_pack_lhs for backends without half/quarter packets
libeigen/eigen!2344

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-23 23:33:42 -07:00
Rasmus Munk Larsen
843ffcec8b Fix warnings reported by NVHPC 26.1
libeigen/eigen!2324

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 11:43:40 -07:00
Florian Maurin
71ef987edb Fixes triangular solves on indexed/sliced dense expressions
libeigen/eigen!2340

Closes #2814
2026-03-22 11:12:21 -07:00
Rasmus Munk Larsen
6490b17e6f Fix sanitizer regressions in sparse serializer and packet tests
libeigen/eigen!2319

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-22 09:10:16 -07:00
Pavel Guzenfeld
a0e30732a7 Remove trailing semicolon from EIGEN_UNUSED_VARIABLE macro
libeigen/eigen!2301

Closes #3007

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-21 16:54:13 -07:00
Rasmus Munk Larsen
54b04fc6b1 Fix mixed-type GEMM packing for backends without half/quarter packets
libeigen/eigen!2297

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 09:46:54 -07:00
Pavel Guzenfeld
1d21d62fbc Fix computeInverseAndDetWithCheck for dynamic result matrices
libeigen/eigen!2312

Closes #2917
2026-03-21 08:38:27 -07:00
Rasmus Munk Larsen
cc8c7cf0e6 Fix bugs and clean up SparseCore module
libeigen/eigen!2250

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 13:31:12 +00:00
Rasmus Munk Larsen
8115b45e50 Fix integer sanitizer issues in shifts and test ranges
libeigen/eigen!2320

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-20 17:27:02 -07:00
Yu You
9d161e0c87 Fine-tune gebp_kernel for aarch64
libeigen/eigen!2278
2026-03-20 14:29:03 -07:00
Pavel Guzenfeld
30128de0e3 Guard eigen_fill_helper on trivially copyable scalars
libeigen/eigen!2313

Closes #2956
2026-03-20 19:03:13 +00:00
Pavel Guzenfeld
36ca36d0de Guard redundant constexpr static member redeclarations for C++17+
libeigen/eigen!2299

Closes #3061

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-18 20:24:09 -07:00
Pavel Guzenfeld
0fd8002b11 Fix most vexing parse in SparseSparseProductWithPruning.h
libeigen/eigen!2298

Closes #3060

Co-authored-by: Pavel Guzenfeld <67074795+PavelGuzenfeld@users.noreply.github.com>
2026-03-18 15:13:22 +00:00
Rasmus Munk Larsen
ea13a98dec Fix imag_ref for real scalar types and clean up svd_fill.h
libeigen/eigen!2303

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-15 19:56:01 -07:00