Rasmus Munk Larsen
014f12f11a
GPU: Add BLAS-1 ops, DeviceScalar, device-resident SpMV, and CG interop (5/5)
...
Add the operator interface needed for GPU iterative solvers:
- BLAS Level-1 on DeviceMatrix: dot(), norm(), squaredNorm(), setZero(),
noalias(), operator+=/-=/\*= dispatching to cuBLAS axpy/scal/dot/nrm2.
- DeviceScalar<Scalar>: device-resident scalar returned by reductions.
Defers host sync until value is read (implicit conversion). Device-side
division via NPP for real types.
- GpuContext: stream-borrowing constructor, setThreadLocal(), cublasLtHandle(),
cusparseHandle().
- GEMM upgraded from cublasGemmEx to cublasLtMatmul with heuristic algorithm
selection and plan caching.
- GpuSparseContext: GpuContext& constructor for same-stream execution,
deviceView() returning DeviceSparseView with operator* for device-resident
SpMV (d_y = d_A * d_x).
- geam expressions: d_C = d_A + alpha * d_B via cublasXgeam.
- GpuSVD::matrixV() convenience wrapper.
These additions make DeviceMatrix usable as a VectorType in Eigen algorithm
templates. Conjugate gradient is the motivating example and is tested against
CPU ConjugateGradient for correctness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 20:19:59 -07:00
Rasmus Munk Larsen
43a95b62bb
GPU: Add sparse solvers, FFT, and SpMV (cuDSS, cuFFT, cuSPARSE)
...
Add GPU sparse direct solvers (Cholesky, LDL^T, LU) via cuDSS, 1D/2D FFT
via cuFFT with plan caching, and sparse matrix-vector/matrix multiply
(SpMV/SpMM) via cuSPARSE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-09 19:11:49 -07:00
Rasmus Munk Larsen
58c44ef36d
GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER)
...
Add Eigen/GPU module: A standalone GPU library dispatch layer where
DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls.
CPU and GPU solvers coexist in the same binary with compatible syntax.
Core infrastructure:
- DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with
async host transfer (fromHost/toHost) and CUDA event-based cross-stream
synchronization.
- GpuContext: Unified execution context owning a CUDA stream + cuBLAS
handle + cuSOLVER handle. Thread-local default with explicit override
via setThreadLocal(). Stream-borrowing constructor for integration.
- DeviceBuffer: Typed RAII device allocation with move semantics.
cuBLAS dispatch (expression syntax):
- GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm)
- TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm)
- SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm)
- SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk)
cuSOLVER dispatch:
- GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs)
- GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs)
- Solver chaining: auto x = d_A.llt().solve(d_B)
- Solver expressions with .device(ctx) for explicit stream control.
CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11,
Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit.
Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp,
gpu_device_matrix.cpp, gpu_library_example.cu
Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp,
bench_gpu_batching.cpp
2026-04-09 19:05:25 -07:00
Rasmus Munk Larsen
110530a4d8
Fix bugs and improve robustness of SelfAdjointEigenSolver, improve test coverage
...
libeigen/eigen!2396
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-07 21:08:29 -07:00
Rasmus Munk Larsen
4ad90a60f1
Replace blas/f2c with clean C++ implementations
...
libeigen/eigen!2402
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-05 16:04:41 -07:00
Rasmus Munk Larsen
d31a73437f
Vectorize asinh and acosh for float and double
...
libeigen/eigen!2376
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 21:46:36 -07:00
Rasmus Munk Larsen
9513d3878e
Vectorize sinh, cosh, and log10
...
libeigen/eigen!2368
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-04-01 20:41:18 -07:00
Rasmus Munk Larsen
9fe2f03fa4
Revert "Lower BDCSVD crossover threshold from 16 to 8"
...
This reverts merge request !2358
2026-03-29 15:25:09 -07:00
Rasmus Munk Larsen
12fe90db8b
Lower BDCSVD crossover threshold from 16 to 8
...
libeigen/eigen!2358
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-29 14:33:22 -07:00
Rasmus Munk Larsen
cf508c096b
Add block Householder right-side application for HouseholderSequence
...
libeigen/eigen!2342
Closes #3057
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-27 19:56:08 -07:00
Rasmus Munk Larsen
9d1e5f3915
Remove benchmark::internal::Benchmark* from all benchmarks
...
libeigen/eigen!2332
Co-authored-by: Rasmus Munk Larsen <rlarsen@nvidia.com >
2026-03-20 17:42:07 -07:00
Rasmus Munk Larsen
8190c82cb4
Add missing SIMD math function benchmarks
...
libeigen/eigen!2284
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-11 23:20:11 -07:00
Rasmus Munk Larsen
8368a12f0f
Add runtime cache size detection for ARM and improve GEMM blocking
...
libeigen/eigen!2282
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-11 22:36:33 -07:00
Rasmus Munk Larsen
662d5c21ff
Optimize SYMV, SYR, SYR2, and TRMV product kernels
...
libeigen/eigen!2228
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-03-01 19:40:11 -08:00
Rasmus Munk Larsen
8525491eb1
Add dedicated unit tests and benchmark for ConditionEstimator
...
libeigen/eigen!2223
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-26 18:26:38 -08:00
Rasmus Munk Larsen
a95440de17
Remove obsolete bench/ and btl/ directories
...
libeigen/eigen!2217
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-25 20:19:45 -08:00
Rasmus Munk Larsen
a31de4778d
Blocked Jacobi SVD sweep with L2-cache-adaptive threshold
...
libeigen/eigen!2206
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
Co-authored-by: Rasmus Munk Larsen <rmlarsen@google.com >
2026-02-25 10:03:05 -08:00
Rasmus Munk Larsen
16da0279f1
Add benchmarks for unsupported modules and extend supported benchmarks
...
libeigen/eigen!2179
Closes #3036
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-24 17:12:33 -08:00
Rasmus Munk Larsen
1f49bf96cf
Add new benchmarks for Core, LU, and QR operations
...
libeigen/eigen!2177
Closes #3035
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-22 12:19:37 -08:00
Rasmus Munk Larsen
d4077a6e99
Reorganize benchmarks into subdirectories and clean up Eigen sources
...
libeigen/eigen!2176
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-21 17:46:55 -08:00
Rasmus Munk Larsen
374fe225bf
Reduce GEMV and TRSM benchmark sizes for faster routine runs
...
libeigen/eigen!2163
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-20 00:56:57 -08:00
Rasmus Munk Larsen
9c63d26dec
Remove reference to nonexistent spmv.cpp in benchmarks
...
libeigen/eigen!2157
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-19 04:22:35 -08:00
Rasmus Munk Larsen
552ca8f15f
Simplify GEBP micro-kernel and improve blocking heuristics
...
libeigen/eigen!2142
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-18 13:16:14 -08:00
Rasmus Munk Larsen
3108f6360e
Migrate Eigen benchmarks to the Google benchmark framework
...
libeigen/eigen!2132
Closes #3025
Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com >
2026-02-17 20:51:36 -08:00