Files
Rasmus Munk Larsen 014f12f11a GPU: Add BLAS-1 ops, DeviceScalar, device-resident SpMV, and CG interop (5/5)
Add the operator interface needed for GPU iterative solvers:

- BLAS Level-1 on DeviceMatrix: dot(), norm(), squaredNorm(), setZero(),
  noalias(), operator+=/-=/\*= dispatching to cuBLAS axpy/scal/dot/nrm2.
- DeviceScalar<Scalar>: device-resident scalar returned by reductions.
  Defers host sync until value is read (implicit conversion). Device-side
  division via NPP for real types.
- GpuContext: stream-borrowing constructor, setThreadLocal(), cublasLtHandle(),
  cusparseHandle().
- GEMM upgraded from cublasGemmEx to cublasLtMatmul with heuristic algorithm
  selection and plan caching.
- GpuSparseContext: GpuContext& constructor for same-stream execution,
  deviceView() returning DeviceSparseView with operator* for device-resident
  SpMV (d_y = d_A * d_x).
- geam expressions: d_C = d_A + alpha * d_B via cublasXgeam.
- GpuSVD::matrixV() convenience wrapper.

These additions make DeviceMatrix usable as a VectorType in Eigen algorithm
templates. Conjugate gradient is the motivating example and is tested against
CPU ConjugateGradient for correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:19:59 -07:00
..
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2023-11-29 11:12:48 +00:00
2024-07-30 22:15:49 +00:00
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00
2026-03-29 16:28:09 -07:00