eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Rasmus Munk Larsen	58c44ef36d	GPU: Add library dispatch module (DeviceMatrix, cuBLAS, cuSOLVER) Add Eigen/GPU module: A standalone GPU library dispatch layer where DeviceMatrix<Scalar> operations map 1:1 to cuBLAS/cuSOLVER calls. CPU and GPU solvers coexist in the same binary with compatible syntax. Core infrastructure: - DeviceMatrix<Scalar>: RAII dense column-major GPU memory wrapper with async host transfer (fromHost/toHost) and CUDA event-based cross-stream synchronization. - GpuContext: Unified execution context owning a CUDA stream + cuBLAS handle + cuSOLVER handle. Thread-local default with explicit override via setThreadLocal(). Stream-borrowing constructor for integration. - DeviceBuffer: Typed RAII device allocation with move semantics. cuBLAS dispatch (expression syntax): - GEMM: d_C = d_A.adjoint() * d_B (cublasXgemm) - TRSM: d_X = d_A.triangularView<Lower>().solve(d_B) (cublasXtrsm) - SYMM/HEMM: d_C = d_A.selfadjointView<Lower>() * d_B (cublasXsymm) - SYRK/HERK: d_C = d_A * d_A.adjoint() (cublasXsyrk) cuSOLVER dispatch: - GpuLLT: Cached Cholesky factorization (cusolverDnXpotrf + Xpotrs) - GpuLU: Cached LU factorization (cusolverDnXgetrf + Xgetrs) - Solver chaining: auto x = d_A.llt().solve(d_B) - Solver expressions with .device(ctx) for explicit stream control. CI: Bump CUDA container to Ubuntu 22.04 (CMake 3.22), GCC 10->11, Clang 12->14. Bump cmake_minimum_required to 3.17 for FindCUDAToolkit. Tests: gpu_cublas.cpp, gpu_cusolver_llt.cpp, gpu_cusolver_lu.cpp, gpu_device_matrix.cpp, gpu_library_example.cu Benchmarks: bench_gpu_solvers.cpp, bench_gpu_chaining.cpp, bench_gpu_batching.cpp	2026-04-09 19:05:25 -07:00
Rasmus Munk Larsen	6a9405bf7a	GPU: Raise CUDA/HIP minimum and remove legacy guards - Raise CUDA minimum from 9.0 to 11.4 (sm_70/Volta). - Raise HIP minimum to GFX906 (Vega 20/MI50) / ROCm 5.6. - Remove EIGEN_HAS_{CUDA,HIP,GPU}_FP16 guards — FP16 is always available on sm_70+ and GFX906+. - Remove obsolete __HIP_ARCH_HAS_* preprocessor branches. - C++14 cleanup: remove pre-C++14 workarounds in GPU code. - Fix NVCC warnings (deprecated register keyword, unreachable code, tautological comparisons). - Fix HIP test execution on gfx1151. - Update CI configuration for new minimum versions.	2026-04-09 15:21:39 -07:00
Charles Schlosser	ba9871e46b	fix and enable realview unit tests libeigen/eigen!2356	2026-03-28 20:13:54 -07:00
Rasmus Munk Larsen	5e521f3e45	Revert "add realview test" This reverts merge request !2352	2026-03-28 17:27:01 -07:00
Charles Schlosser	87ae1dbe7f	add realview test libeigen/eigen!2352	2026-03-28 16:26:51 -07:00
Rasmus Munk Larsen	54b04fc6b1	Fix mixed-type GEMM packing for backends without half/quarter packets libeigen/eigen!2297 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-03-21 09:46:54 -07:00
Rasmus Munk Larsen	c66fc52868	Add ULP accuracy measurement tool and documentation for vectorized math functions libeigen/eigen!2153 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-03-01 13:22:16 -08:00
Rasmus Munk Larsen	8525491eb1	Add dedicated unit tests and benchmark for ConditionEstimator libeigen/eigen!2223 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-26 18:26:38 -08:00
Rasmus Munk Larsen	1b1b7e347d	Fix EIGEN_NO_AUTOMATIC_RESIZING not resizing empty destinations libeigen/eigen!2219 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-26 07:54:27 -08:00
Rasmus Munk Larsen	4fab38d798	Make clang generic vector backend support 16, 32, and 64-byte vectors libeigen/eigen!2213 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-25 08:50:47 -08:00
Rasmus Munk Larsen	ea25ea52bb	Revert accidental changes from !2212 squash merge libeigen/eigen!2214 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-25 08:31:41 -08:00
Rasmus Munk Larsen	38f0f42755	Update rmlarsen email address from @google.com to @gmail.com libeigen/eigen!2212 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-25 07:45:02 -08:00
Rasmus Munk Larsen	fa567f6bcd	Add CUDA CI jobs with NVHPC (nvc++) as host and device compiler libeigen/eigen!2204 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-24 16:54:08 -08:00
Rasmus Munk Larsen	9810969c0f	Suppress false-positive GCC and clang warnings in test builds libeigen/eigen!2187 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com>	2026-02-22 14:54:15 -08:00
Rasmus Munk Larsen	2b561f9284	Revert "Specialized enable_borrowed_ranges for VectorwiseOp class range iteration" This reverts merge request !2127	2026-02-16 02:12:28 -08:00
Blake	d0654a201b	Specialized enable_borrowed_ranges for VectorwiseOp class range iteration libeigen/eigen!2127 Closes #2882	2026-02-15 07:31:33 -08:00
Blake	23fcc1c6c9	MatrixBase::diagonalView issue 604 libeigen/eigen!2126 Closes #604	2026-02-10 02:12:03 +00:00
Ludwig Striet	99f8512985	ComplexQZ	2025-10-13 17:35:03 +00:00
Guilhem Saurel	a67f9dabb0	tests: add missing link	2025-10-01 22:38:52 +00:00
Charles Schlosser	28c3b26d53	masked load/store framework	2025-04-12 00:31:10 +00:00
Antonio Sanchez	179a49684a	Fix CMake BOOST warning	2025-02-28 07:33:26 -08:00
Rasmus Munk Larsen	72adf891d5	Slightly simplify ForkJoin code, and make sure the test is actually run.	2025-02-25 17:22:43 +00:00
Johannes Zipfel	2926b2e0a9	added functions to fetch L and U Factors from IncompleteLUT	2025-01-31 18:32:38 +00:00
Antonio Sánchez	d26e19714f	Add missing cwiseSquare, tests for cwise matrix ops.	2024-03-28 04:26:55 +00:00
Antonio Sánchez	1d4369c2ff	Fix CwiseUnaryView.	2024-03-11 19:08:30 +00:00
Antonio Sánchez	75e273afcc	Add internal ctz/clz implementation.	2023-12-11 21:03:09 +00:00
Pavel Labath	66b9f4ed5c	Fix (u)int64_t->float conversion on arm	2023-11-21 16:09:12 +00:00
Antonio Sanchez	3a9635b20c	Link pthread for product_threaded test	2023-11-13 11:34:23 -08:00
Rasmus Munk Larsen	76e8c04553	Generalize parallel GEMM implementation in Core to work with ThreadPool in addition to OpenMP.	2023-11-10 17:42:30 +00:00
Ioannis Assiouras	d9839718aa	[ROCm] Replace HIP_PATH with ROCM_PATH for rocm 6.0	2023-10-16 20:56:35 +00:00
Alejandro Acosta	24d15e086f	[SYCL-2020] Add test to validate SYCL in Eigen core.	2023-07-28 15:45:08 +00:00
Tobias Wood	94f57867fe	Thread pool	2023-05-05 16:23:34 +00:00
Antonio Sánchez	e256ad1823	Remove LGPL Code and references.	2023-02-08 01:25:06 +00:00
Antonio Sánchez	dbf7ae6f9b	Fix up C++ version detection macros and cmake tests.	2022-12-20 18:06:03 +00:00
Alexander Richardson	62de593c40	Allow std::initializer_list constructors in constexpr expressions	2022-12-14 17:05:37 +00:00
Rasmus Munk Larsen	273e0c884e	Revert "Add constexpr, test for C++14 constexpr."	2022-09-16 21:14:29 +00:00
Thomas Gloor	ec9c7163a3	Feature/skew symmetric matrix3	2022-09-08 20:44:40 +00:00
Tobias Schlüter	133498c329	Add constexpr, test for C++14 constexpr.	2022-09-07 03:42:34 +00:00
Antonio Sánchez	f5364331eb	Fix some cmake issues.	2022-09-02 16:43:14 +00:00
John Mather	3a9d404d76	Add support for Apple's Accelerate sparse matrix solvers	2022-03-08 00:09:18 +00:00
Andrew Johnson	a491c7f898	Allow specifying inner & outer stride for CWiseUnaryView - fixes #2398	2022-01-05 19:24:46 +00:00
Erik Schultheis	495ffff945	removed helper cmake macro and don't use deprecated COMPILE_FLAGS anymore.	2021-12-09 23:09:56 +00:00
Antonio Sanchez	de218b471d	Add -arch=<arch> argument for nvcc. Without this flag, when compiling with nvcc, if the compute architecture of a card does not exactly match any of those listed for `-gencode arch=compute_<arch>,code=sm_<arch>`, then the kernel will fail to run with: ``` cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. ``` This can happen, for example, when compiling with an older cuda version that does not support a newer architecture (e.g. T4 is `sm_75`, but cuda 9.2 only supports up to `sm_70`). With the `-arch=<arch>` flag, the code will compile and run at the supplied architecture.	2021-09-24 20:48:01 -07:00
Antonio Sanchez	846d34384a	Rename EIGEN_CUDA_FLAGS to EIGEN_CUDA_CXX_FLAGS Also add a missing space for clang.	2021-09-24 20:15:55 -07:00
Antonio Sanchez	7b00e8b186	Clean up CUDA CMake files. - Unify test/CMakeLists.txt and unsupported/test/CMakeLists.txt - Added `EIGEN_CUDA_FLAGS` that are appended to the set of flags passed to the cuda compiler (nvcc or clang). The latter is to support passing custom flags (e.g. `-arch=` to nvcc, or to disable cuda-specific warnings).	2021-09-24 14:43:59 -07:00
Antonio Sanchez	bf66137efc	New GPU test utilities. This introduces new functions: ``` // returns kernel(args...) running on the CPU. Eigen::run_on_cpu(Kernel kernel, Args&&... args); // returns kernel(args...) running on the GPU. Eigen::run_on_gpu(Kernel kernel, Args&&... args); Eigen::run_on_gpu_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args); // returns kernel(args...) running on the GPU if using // a GPU compiler, or CPU otherwise. Eigen::run(Kernel kernel, Args&&... args); Eigen::run_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args); ``` Running on the GPU is accomplished by: - Serializing the kernel inputs on the CPU - Transferring the inputs to the GPU - Passing the kernel and serialized inputs to a GPU kernel - Deserializing the inputs on the GPU - Running `kernel(inputs...)` on the GPU - Serializing all output parameters and the return value - Transferring the serialized outputs back to the CPU - Deserializing the outputs and return value on the CPU - Returning the deserialized return value All inputs must be serializable (currently POD types, `Eigen::Matrix` and `Eigen::Array`). The kernel must also be POD (though usually contains no actual data). Tested on CUDA 9.1, 10.2, 11.3, with g++-6, g++-8, g++-10 respectively. This MR depends on !622, !623, !624.	2021-09-10 14:22:50 -07:00
Antonio Sanchez	26e5beb8cb	Device-compatible Tuple implementation. An analogue of `std::tuple` that works on device. Context: I've tried `std::tuple` in various versions of NVCC and clang, and although code seems to compile, it often fails to run - generating "illegal memory access" errors, or "illegal instruction" errors. This replacement does work on device.	2021-09-08 13:34:19 -07:00
Antonio Sanchez	fcd73b4884	Add a simple serialization mechanism. The `Serializer<T>` class implements a binary serialization that can write to (`serialize`) and read from (`deserialize`) a byte buffer. Also added convenience routines for serializing a list of arguments. This will mainly be for testing, specifically to transfer data to and from the GPU.	2021-09-08 09:38:59 -07:00
Antonio Sanchez	eeacbd26c8	Bump CMake files to at least c++11. Removed all configurations that explicitly test or set the c++ standard flags. The only place the standard is now configured is at the top of the main `CMakeLists.txt` file, which can easily be updated (e.g. if we decide to move to c++14+). This can also be set via command-line using ``` > cmake -DCMAKE_CXX_STANDARD 14 ``` Kept the `EIGEN_TEST_CXX11` flag for now - that still controls whether to build/run the `cxx11_*` tests. We will likely end up renaming these tests and removing the `CXX11` subfolder.	2021-08-25 20:07:48 +00:00
Kolja Brix	58e086b8c8	Add random matrix generation via SVD	2021-08-23 16:00:05 +00:00

1 2 3 4 5 ...

349 Commits