Rasmus Munk Larsen
014f12f11a
GPU: Add BLAS-1 ops, DeviceScalar, device-resident SpMV, and CG interop (5/5)
...
Add the operator interface needed for GPU iterative solvers:
- BLAS Level-1 on DeviceMatrix: dot(), norm(), squaredNorm(), setZero(),
noalias(), operator+=/-=/\*= dispatching to cuBLAS axpy/scal/dot/nrm2.
- DeviceScalar<Scalar>: device-resident scalar returned by reductions.
Defers host sync until value is read (implicit conversion). Device-side
division via NPP for real types.
- GpuContext: stream-borrowing constructor, setThreadLocal(), cublasLtHandle(),
cusparseHandle().
- GEMM upgraded from cublasGemmEx to cublasLtMatmul with heuristic algorithm
selection and plan caching.
- GpuSparseContext: GpuContext& constructor for same-stream execution,
deviceView() returning DeviceSparseView with operator* for device-resident
SpMV (d_y = d_A * d_x).
- geam expressions: d_C = d_A + alpha * d_B via cublasXgeam.
- GpuSVD::matrixV() convenience wrapper.
These additions make DeviceMatrix usable as a VectorType in Eigen algorithm
templates. Conjugate gradient is the motivating example and is tested against
CPU ConjugateGradient for correctness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 20:19:59 -07:00
Rasmus Munk Larsen
43a95b62bb
GPU: Add sparse solvers, FFT, and SpMV (cuDSS, cuFFT, cuSPARSE)
...
Add GPU sparse direct solvers (Cholesky, LDL^T, LU) via cuDSS, 1D/2D FFT
via cuFFT with plan caching, and sparse matrix-vector/matrix multiply
(SpMV/SpMM) via cuSPARSE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 19:11:49 -07:00
Rasmus Munk Larsen
8593c7f5a1
GPU: Add dense cuSOLVER solvers (QR, SVD, EigenSolver)
...
Add QR (geqrf + ormqr + trsm), SVD (gesvd), and self-adjoint eigenvalue
decomposition (syevd) via cuSOLVER. All support host and DeviceMatrix input.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 19:11:34 -07:00
Rasmus Munk Larsen
58c44ef36d
GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER)
...
Add Eigen/GPU module: A standalone GPU library dispatch layer where
DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls.
CPU and GPU solvers coexist in the same binary with compatible syntax.
Core infrastructure:
- DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with
async host transfer (fromHost/toHost) and CUDA event-based cross-stream
synchronization.
- GpuContext: Unified execution context owning a CUDA stream + cuBLAS
handle + cuSOLVER handle. Thread-local default with explicit override
via setThreadLocal(). Stream-borrowing constructor for integration.
- DeviceBuffer: Typed RAII device allocation with move semantics.
cuBLAS dispatch (expression syntax):
- GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm)
- TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm)
- SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm)
- SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk)
cuSOLVER dispatch:
- GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs)
- GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs)
- Solver chaining: auto x = d_A.llt().solve(d_B)
- Solver expressions with .device(ctx) for explicit stream control.
CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11,
Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit.
Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp,
gpu_device_matrix.cpp, gpu_library_example.cu
Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp,
bench_gpu_batching.cpp
2026-04-09 19:05:25 -07:00
Rasmus Munk Larsen
6a9405bf7a
GPU: Raise CUDA/HIP minimum and remove legacy guards
...
- Raise CUDA minimum from 9.0 to 11.4 (sm_70/Volta).
- Raise HIP minimum to GFX906 (Vega 20/MI50) / ROCm 5.6.
- Remove EIGEN_HAS_{CUDA,HIP,GPU}_FP16 guards — FP16 is always available
on sm_70+ and GFX906+.
- Remove obsolete __HIP_ARCH_HAS_* preprocessor branches.
- C++14 cleanup: remove pre-C++14 workarounds in GPU code.
- Fix NVCC warnings (deprecated register keyword, unreachable code,
tautological comparisons).
- Fix HIP test execution on gfx1151.
- Update CI configuration for new minimum versions.
2026-04-09 15:21:39 -07:00
Rasmus Munk Larsen
e055e4e415
Add plog_core_double with fallback for AVX without AVX2
...
libeigen/eigen!2407
Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com >
2026-04-08 19:41:07 -07:00
Rasmus Munk Larsen
b1d2ce4c85
Revert "Speed up plog_double ~1.7x with fast integer range reduction"
...
This reverts merge request !2385
2026-04-08 13:03:48 -07:00
Rasmus Munk Larsen
ab70739c9c
Speed up plog_double ~1.7x with fast integer range reduction
...
libeigen/eigen!2385
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-07 21:48:25 -07:00
Rasmus Munk Larsen
e778b5d22b
Switch ASAN/UBSAN smoketest pipelines to large runners
...
libeigen/eigen!2405
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-07 21:37:58 -07:00
Rasmus Munk Larsen
def45c5e1e
Improve psincos_double: faster polynomials + accurate range reduction
...
libeigen/eigen!2389
Closes #3052
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-07 21:24:24 -07:00
Rasmus Munk Larsen
110530a4d8
Fix bugs and improve robustness of SelfAdjointEigenSolver, improve test coverage
...
libeigen/eigen!2396
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-07 21:08:29 -07:00
Rasmus Munk Larsen
bde3a68bae
Improve dense linear solver docs with practical guidance
...
libeigen/eigen!2395
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-05 21:40:42 -07:00
Rasmus Munk Larsen
8eabfb5342
Vectorize BLAS level 1/2 routines with Eigen expressions
...
libeigen/eigen!2404
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-05 18:53:11 -07:00
Rasmus Munk Larsen
4ad90a60f1
Replace blas/f2c with clean C++ implementations
...
libeigen/eigen!2402
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-05 16:04:41 -07:00
Rasmus Munk Larsen
fe6ada10be
Prevent nightly CI pipelines from being auto-cancelled
...
libeigen/eigen!2390
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-04 11:52:15 -07:00
Alexander Grund
8179474225
CI: Add AVX512-FP16 build tests with GCC 13
...
libeigen/eigen!1652
Co-authored-by: Alexander Grund <alexander.grund@tu-dresden.de >
2026-04-04 11:32:31 -07:00
Florian Maurin
b57d860f3e
Fix GCC maybe-uninitialized warning in InnerProduct
...
libeigen/eigen!2386
Closes #3015
2026-04-03 19:41:09 -07:00
Rasmus Munk Larsen
a3074053a6
Speed up pexp_double by ~15-17%
...
libeigen/eigen!2388
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-03 17:09:11 -07:00
Rasmus Munk Larsen
a91913e961
Speed up plog_float by 1.6x with improved accuracy
...
libeigen/eigen!2382
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-03 13:45:01 -07:00
Rasmus Munk Larsen
ebae0c7c10
ulp_accuracy: use dynamic work queue for thread load balancing
...
libeigen/eigen!2383
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-02 22:40:03 -07:00
Charles Schlosser
5977635d64
fix singed integer overflow UB in integer_types and other trivial compiler warnings
...
libeigen/eigen!2380
2026-04-03 03:36:28 +00:00
Rasmus Munk Larsen
60df12437e
Fix ulp_accuracy crashes in Release builds
...
libeigen/eigen!2381
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-02 20:12:13 -07:00
Pavel Guzenfeld
e315a8cdd0
Inline IndexedViewMethods.inc into DenseBase.h
...
libeigen/eigen!2330
Closes #2766
2026-04-02 15:26:56 -07:00
Rasmus Munk Larsen
8ec68856a6
Fix basicstuff_8 casting test failure on loongarch64
...
libeigen/eigen!2379
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-02 14:14:54 -07:00
Rasmus Munk Larsen
61a8662876
Improve log1p accuracy and speed with direct range reduction
...
libeigen/eigen!2378
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-02 11:29:25 -07:00
Rasmus Munk Larsen
d31a73437f
Vectorize asinh and acosh for float and double
...
libeigen/eigen!2376
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 21:46:36 -07:00
Rasmus Munk Larsen
9513d3878e
Vectorize sinh, cosh, and log10
...
libeigen/eigen!2368
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 20:41:18 -07:00
Rasmus Munk Larsen
30e669cfe1
Tensor module: const-correctness and constexpr improvements
...
libeigen/eigen!2239
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 17:49:56 -07:00
Rasmus Munk Larsen
64885cc6a3
Fix remaining MSVC warnings in Windows CI (C4804, C4244, C4146, C4305)
...
libeigen/eigen!2374
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 17:20:31 -07:00
Rasmus Munk Larsen
6a07970d7d
CI: split NVHPC build and make fallback parallelism configurable
...
libeigen/eigen!2372
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 16:43:33 -07:00
Rasmus Munk Larsen
4be66f2830
CI: fail test jobs when no tests are found (--no-tests=error)
...
libeigen/eigen!2373
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 12:50:54 -07:00
Rasmus Munk Larsen
1df89cbc21
Right-size CI runners to reduce waste and shuffle build order to avoid OOM
...
libeigen/eigen!2367
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-31 19:10:34 -07:00
Rasmus Munk Larsen
b54640df19
Fix NVHPC warnings in Visitor.h and Memory.h
...
libeigen/eigen!2370
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-31 15:09:37 -07:00
Rasmus Munk Larsen
7fcbed7acb
Fill packet math coverage gaps across multiple architectures
...
libeigen/eigen!2237
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-31 14:52:32 -07:00
Rasmus Munk Larsen
80ab2898e2
CI: install libclang-rt-14-dev for sanitizer smoketest
...
libeigen/eigen!2369
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-31 00:16:18 -07:00
Rasmus Munk Larsen
798d7f2bec
CI: drop Clang-6, bump base image to Ubuntu 24.04 and Clang 12 to 14
...
libeigen/eigen!2366
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-30 22:00:17 -07:00
Rasmus Munk Larsen
1ade3636b9
Fix BDCSVD bidiagonal hard-case failures on ARM with GCC
...
libeigen/eigen!2365
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-30 20:17:37 -07:00
Rasmus Munk Larsen
801a9ee690
Fix ~1,460 MSVC warnings from generic code instantiated with bool
...
libeigen/eigen!2364
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 21:05:49 -07:00
Rasmus Munk Larsen
806c7b6590
CI: fix Windows build cache key containing invalid path characters
...
libeigen/eigen!2362
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 19:57:45 -07:00
Rasmus Munk Larsen
2776ba55eb
Update slicing tutorial docs to reflect Eigen::placeholders namespace
...
libeigen/eigen!2360
Closes #3064
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 18:51:02 -07:00
Rasmus Munk Larsen
09581fda38
Modernize tensor contraction code: bug fixes, dead code removal, and cleanup
...
libeigen/eigen!2248
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 18:03:06 -07:00
Rasmus Munk Larsen
732ebc8cc2
Modernize evaluator files
...
libeigen/eigen!2245
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 17:40:39 -07:00
Rasmus Munk Larsen
255f522e2e
Fix bugs, docs, and structure in unsupported/ public headers
...
libeigen/eigen!2254
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 17:06:40 -07:00
Pavel Guzenfeld
bd276fbb28
Map .inc files to C++ in Doxygen extension mapping
...
libeigen/eigen!2338
2026-03-29 16:48:13 -07:00
Rasmus Munk Larsen
c8633ceeea
Clean up top-level Eigen headers
...
libeigen/eigen!2252
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 16:28:09 -07:00
Rasmus Munk Larsen
409296d91d
Add nightly benchmark regression detection pipeline
...
libeigen/eigen!2349
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 16:03:56 -07:00
Pavel Guzenfeld
753a6ac5b3
Fix private shadowing of protected base members in iterative solvers
...
libeigen/eigen!2357
Closes #1859
2026-03-29 15:40:48 -07:00
Rasmus Munk Larsen
9fe2f03fa4
Revert "Lower BDCSVD crossover threshold from 16 to 8"
...
This reverts merge request !2358
2026-03-29 15:25:09 -07:00
Rasmus Munk Larsen
12fe90db8b
Lower BDCSVD crossover threshold from 16 to 8
...
libeigen/eigen!2358
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 14:33:22 -07:00
Pavel Guzenfeld
b7f6aed1b9
Fix dangling reference in IndexedView with expression indices
...
libeigen/eigen!2335
Closes #1943
2026-03-29 09:39:13 -07:00