Files
eigen/Eigen/GPU

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

59 lines
1.7 KiB
Plaintext
Raw Permalink Normal View History

GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER) Add Eigen/GPU module: A standalone GPU library dispatch layer where DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls. CPU and GPU solvers coexist in the same binary with compatible syntax. Core infrastructure: - DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with async host transfer (fromHost/toHost) and CUDA event-based cross-stream synchronization. - GpuContext: Unified execution context owning a CUDA stream + cuBLAS handle + cuSOLVER handle. Thread-local default with explicit override via setThreadLocal(). Stream-borrowing constructor for integration. - DeviceBuffer: Typed RAII device allocation with move semantics. cuBLAS dispatch (expression syntax): - GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm) - TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm) - SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm) - SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk) cuSOLVER dispatch: - GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs) - GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs) - Solver chaining: auto x = d_A.llt().solve(d_B) - Solver expressions with .device(ctx) for explicit stream control. CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11, Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit. Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp, gpu_device_matrix.cpp, gpu_library_example.cu Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp, bench_gpu_batching.cpp
2026-04-09 16:15:39 -07:00
// This file is part of Eigen, a lightweight C++ template library
// for linear algebra.
//
// This Source Code Form is subject to the terms of the Mozilla
// Public License v. 2.0. If a copy of the MPL was not distributed
// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
#ifndef EIGEN_GPU_MODULE_H
#define EIGEN_GPU_MODULE_H
#include "Core"
#include "src/Core/util/DisableStupidWarnings.h"
/** \defgroup GPU_Module GPU module
*
* GPU-accelerated solvers and operations using NVIDIA CUDA libraries
* (cuSOLVER, cuBLAS, cuSPARSE, cuFFT, cuDSS).
*
* This module provides explicit GPU solver classes that coexist with Eigen's
* CPU solvers. Unlike the LAPACKE dispatch (which replaces the CPU
* implementation globally), GPU classes are separate types the user
* instantiates by choice:
*
* \code
* #define EIGEN_USE_GPU
* #include <Eigen/GPU>
*
* // CPU path (unchanged)
* Eigen::LLT<Eigen::MatrixXd> llt_cpu(A);
*
* // GPU path (explicit)
* Eigen::GpuLLT<double> llt_gpu(A); // L stays on device
* auto X = llt_gpu.solve(B); // only B transferred per solve
* \endcode
*
* Requires CUDA 11.4+. See CLAUDE.md.
*/
#ifdef EIGEN_USE_GPU
// IWYU pragma: begin_exports
#include "src/GPU/DeviceMatrix.h"
#include "src/GPU/GpuContext.h"
#include "src/GPU/DeviceExpr.h"
#include "src/GPU/DeviceBlasExpr.h"
#include "src/GPU/DeviceSolverExpr.h"
#include "src/GPU/DeviceDispatch.h"
#include "src/GPU/GpuLLT.h"
#include "src/GPU/GpuLU.h"
#include "src/GPU/GpuQR.h"
#include "src/GPU/GpuSVD.h"
#include "src/GPU/GpuEigenSolver.h"
GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER) Add Eigen/GPU module: A standalone GPU library dispatch layer where DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls. CPU and GPU solvers coexist in the same binary with compatible syntax. Core infrastructure: - DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with async host transfer (fromHost/toHost) and CUDA event-based cross-stream synchronization. - GpuContext: Unified execution context owning a CUDA stream + cuBLAS handle + cuSOLVER handle. Thread-local default with explicit override via setThreadLocal(). Stream-borrowing constructor for integration. - DeviceBuffer: Typed RAII device allocation with move semantics. cuBLAS dispatch (expression syntax): - GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm) - TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm) - SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm) - SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk) cuSOLVER dispatch: - GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs) - GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs) - Solver chaining: auto x = d_A.llt().solve(d_B) - Solver expressions with .device(ctx) for explicit stream control. CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11, Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit. Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp, gpu_device_matrix.cpp, gpu_library_example.cu Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp, bench_gpu_batching.cpp
2026-04-09 16:15:39 -07:00
// IWYU pragma: end_exports
#endif
#include "src/Core/util/ReenableStupidWarnings.h"
#endif // EIGEN_GPU_MODULE_H