Commit Graph

349 Commits

Author SHA1 Message Date
Rasmus Munk Larsen
58c44ef36d GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER)
Add Eigen/GPU module: A standalone GPU library dispatch layer where
DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls.
CPU and GPU solvers coexist in the same binary with compatible syntax.

Core infrastructure:
- DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with
  async host transfer (fromHost/toHost) and CUDA event-based cross-stream
  synchronization.
- GpuContext: Unified execution context owning a CUDA stream + cuBLAS
  handle + cuSOLVER handle. Thread-local default with explicit override
  via setThreadLocal(). Stream-borrowing constructor for integration.
- DeviceBuffer: Typed RAII device allocation with move semantics.

cuBLAS dispatch (expression syntax):
- GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm)
- TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm)
- SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm)
- SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk)

cuSOLVER dispatch:
- GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs)
- GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs)
- Solver chaining: auto x = d_A.llt().solve(d_B)
- Solver expressions with .device(ctx) for explicit stream control.

CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11,
Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit.

Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp,
gpu_device_matrix.cpp, gpu_library_example.cu
Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp,
bench_gpu_batching.cpp
2026-04-09 19:05:25 -07:00
Rasmus Munk Larsen
6a9405bf7a GPU: Raise CUDA/HIP minimum and remove legacy guards
- Raise CUDA minimum from 9.0 to 11.4 (sm_70/Volta).
- Raise HIP minimum to GFX906 (Vega 20/MI50) / ROCm 5.6.
- Remove EIGEN_HAS_{CUDA,HIP,GPU}_FP16 guards — FP16 is always available
  on sm_70+ and GFX906+.
- Remove obsolete __HIP_ARCH_HAS_* preprocessor branches.
- C++14 cleanup: remove pre-C++14 workarounds in GPU code.
- Fix NVCC warnings (deprecated register keyword, unreachable code,
  tautological comparisons).
- Fix HIP test execution on gfx1151.
- Update CI configuration for new minimum versions.
2026-04-09 15:21:39 -07:00
Charles Schlosser
ba9871e46b fix and enable realview unit tests
libeigen/eigen!2356
2026-03-28 20:13:54 -07:00
Rasmus Munk Larsen
5e521f3e45 Revert "add realview test"
This reverts merge request !2352
2026-03-28 17:27:01 -07:00
Charles Schlosser
87ae1dbe7f add realview test
libeigen/eigen!2352
2026-03-28 16:26:51 -07:00
Rasmus Munk Larsen
54b04fc6b1 Fix mixed-type GEMM packing for backends without half/quarter packets
libeigen/eigen!2297

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-21 09:46:54 -07:00
Rasmus Munk Larsen
c66fc52868 Add ULP accuracy measurement tool and documentation for vectorized math functions
libeigen/eigen!2153

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-03-01 13:22:16 -08:00
Rasmus Munk Larsen
8525491eb1 Add dedicated unit tests and benchmark for ConditionEstimator
libeigen/eigen!2223

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 18:26:38 -08:00
Rasmus Munk Larsen
1b1b7e347d Fix EIGEN_NO_AUTOMATIC_RESIZING not resizing empty destinations
libeigen/eigen!2219

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-26 07:54:27 -08:00
Rasmus Munk Larsen
4fab38d798 Make clang generic vector backend support 16, 32, and 64-byte vectors
libeigen/eigen!2213

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 08:50:47 -08:00
Rasmus Munk Larsen
ea25ea52bb Revert accidental changes from !2212 squash merge
libeigen/eigen!2214

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 08:31:41 -08:00
Rasmus Munk Larsen
38f0f42755 Update rmlarsen email address from @google.com to @gmail.com
libeigen/eigen!2212

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-25 07:45:02 -08:00
Rasmus Munk Larsen
fa567f6bcd Add CUDA CI jobs with NVHPC (nvc++) as host and device compiler
libeigen/eigen!2204

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-24 16:54:08 -08:00
Rasmus Munk Larsen
9810969c0f Suppress false-positive GCC and clang warnings in test builds
libeigen/eigen!2187

Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>
2026-02-22 14:54:15 -08:00
Rasmus Munk Larsen
2b561f9284 Revert "Specialized enable_borrowed_ranges for VectorwiseOp class range iteration"
This reverts merge request !2127
2026-02-16 02:12:28 -08:00
Blake
d0654a201b Specialized enable_borrowed_ranges for VectorwiseOp class range iteration
libeigen/eigen!2127

Closes #2882
2026-02-15 07:31:33 -08:00
Blake
23fcc1c6c9 MatrixBase::diagonalView issue 604
libeigen/eigen!2126

Closes #604
2026-02-10 02:12:03 +00:00
Ludwig Striet
99f8512985 ComplexQZ 2025-10-13 17:35:03 +00:00
Guilhem Saurel
a67f9dabb0 tests: add missing link 2025-10-01 22:38:52 +00:00
Charles Schlosser
28c3b26d53 masked load/store framework 2025-04-12 00:31:10 +00:00
Antonio Sanchez
179a49684a Fix CMake BOOST warning 2025-02-28 07:33:26 -08:00
Rasmus Munk Larsen
72adf891d5 Slightly simplify ForkJoin code, and make sure the test is actually run. 2025-02-25 17:22:43 +00:00
Johannes Zipfel
2926b2e0a9 added functions to fetch L and U Factors from IncompleteLUT 2025-01-31 18:32:38 +00:00
Antonio Sánchez
d26e19714f Add missing cwiseSquare, tests for cwise matrix ops. 2024-03-28 04:26:55 +00:00
Antonio Sánchez
1d4369c2ff Fix CwiseUnaryView. 2024-03-11 19:08:30 +00:00
Antonio Sánchez
75e273afcc Add internal ctz/clz implementation. 2023-12-11 21:03:09 +00:00
Pavel Labath
66b9f4ed5c Fix (u)int64_t->float conversion on arm 2023-11-21 16:09:12 +00:00
Antonio Sanchez
3a9635b20c Link pthread for product_threaded test 2023-11-13 11:34:23 -08:00
Rasmus Munk Larsen
76e8c04553 Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP. 2023-11-10 17:42:30 +00:00
Ioannis Assiouras
d9839718aa [ROCm] Replace HIP_PATH with ROCM_PATH for rocm 6.0 2023-10-16 20:56:35 +00:00
Alejandro Acosta
24d15e086f [SYCL-2020] Add test to validate SYCL in Eigen core. 2023-07-28 15:45:08 +00:00
Tobias Wood
94f57867fe Thread pool 2023-05-05 16:23:34 +00:00
Antonio Sánchez
e256ad1823 Remove LGPL Code and references. 2023-02-08 01:25:06 +00:00
Antonio Sánchez
dbf7ae6f9b Fix up C++ version detection macros and cmake tests. 2022-12-20 18:06:03 +00:00
Alexander Richardson
62de593c40 Allow std::initializer_list constructors in constexpr expressions 2022-12-14 17:05:37 +00:00
Rasmus Munk Larsen
273e0c884e Revert "Add constexpr, test for C++14 constexpr." 2022-09-16 21:14:29 +00:00
Thomas Gloor
ec9c7163a3 Feature/skew symmetric matrix3 2022-09-08 20:44:40 +00:00
Tobias Schlüter
133498c329 Add constexpr, test for C++14 constexpr. 2022-09-07 03:42:34 +00:00
Antonio Sánchez
f5364331eb Fix some cmake issues. 2022-09-02 16:43:14 +00:00
John Mather
3a9d404d76 Add support for Apple's Accelerate sparse matrix solvers 2022-03-08 00:09:18 +00:00
Andrew Johnson
a491c7f898 Allow specifying inner & outer stride for CWiseUnaryView - fixes #2398 2022-01-05 19:24:46 +00:00
Erik Schultheis
495ffff945 removed helper cmake macro and don't use deprecated COMPILE_FLAGS anymore. 2021-12-09 23:09:56 +00:00
Antonio Sanchez
de218b471d Add -arch=<arch> argument for nvcc.
Without this flag, when compiling with nvcc, if the compute architecture of a card does
not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`,
then the kernel will fail to run with:
```
cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
This can happen, for example, when compiling with an older cuda version
that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda
9.2 only supports up to `sm_70`).

With the `-arch=<arch>` flag, the code will compile and run at the
supplied architecture.
2021-09-24 20:48:01 -07:00
Antonio Sanchez
846d34384a Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS
Also add a missing space for clang.
2021-09-24 20:15:55 -07:00
Antonio Sanchez
7b00e8b186 Clean up CUDA CMake files.
- Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt
- Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed
to the cuda compiler (nvcc or clang).

The latter is to support passing custom flags (e.g. `-arch=` to nvcc,
or to disable cuda-specific warnings).
2021-09-24 14:43:59 -07:00
Antonio Sanchez
bf66137efc New GPU test utilities.
This introduces new functions:
```
// returns kernel(args...) running on the CPU.
Eigen::run_on_cpu(Kernel kernel, Args&&... args);

// returns kernel(args...) running on the GPU.
Eigen::run_on_gpu(Kernel kernel, Args&&... args);
Eigen::run_on_gpu_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);

// returns kernel(args...) running on the GPU if using
//   a GPU compiler, or CPU otherwise.
Eigen::run(Kernel kernel, Args&&... args);
Eigen::run_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
```

Running on the GPU is accomplished by:
- Serializing the kernel inputs on the CPU
- Transferring the inputs to the GPU
- Passing the kernel and serialized inputs to a GPU kernel
- Deserializing the inputs on the GPU
- Running `kernel(inputs...)` on the GPU
- Serializing all output parameters and the return value
- Transferring the serialized outputs back to the CPU
- Deserializing the outputs and return value on the CPU
- Returning the deserialized return value

All inputs must be serializable (currently POD types, `Eigen::Matrix`
and `Eigen::Array`).  The kernel must also  be POD (though usually
contains no actual data).

Tested on CUDA 9.1, 10.2, 11.3, with g++-6, g++-8, g++-10 respectively.

This MR depends on !622, !623, !624.
2021-09-10 14:22:50 -07:00
Antonio Sanchez
26e5beb8cb Device-compatible Tuple implementation.
An analogue of `std::tuple` that works on device.

Context: I've tried `std::tuple` in various versions of NVCC and clang,
and although code seems to compile, it often fails to run - generating
"illegal memory access" errors, or "illegal instruction" errors.
This replacement does work on device.
2021-09-08 13:34:19 -07:00
Antonio Sanchez
fcd73b4884 Add a simple serialization mechanism.
The `Serializer<T>` class implements a binary serialization that
can write to (`serialize`) and read from (`deserialize`) a byte
buffer.  Also added convenience routines for serializing
a list of arguments.

This will mainly be for testing, specifically to transfer data to
and from the GPU.
2021-09-08 09:38:59 -07:00
Antonio Sanchez
eeacbd26c8 Bump CMake files to at least c++11.
Removed all configurations that explicitly test or set the c++ standard
flags. The only place the standard is now configured is at the top of
the main `CMakeLists.txt` file, which can easily be updated (e.g. if
we decide to move to c++14+). This can also be set via command-line using
```
> cmake -DCMAKE_CXX_STANDARD 14
```

Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to
build/run the `cxx11_*` tests.  We will likely end up renaming these
tests and removing the `CXX11` subfolder.
2021-08-25 20:07:48 +00:00
Kolja Brix
58e086b8c8 Add random matrix generation via SVD 2021-08-23 16:00:05 +00:00