eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Erik Schultheis	ec2fd0f7ed	Require recent GCC and MSCV and removed `EIGEN_HAS_CXX14` and some other feature test macros	2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen	5137a5157a	Make numeric_limits members constexpr as per the newer C++ standards. Author: majnemer@google.com	2021-11-19 15:58:36 +00:00
Chip Kerchner	9cf34ee0ae	Invert rows and depth in non-vectorized portion of packing (PowerPC).	2021-10-28 21:59:41 +00:00
Ilya Tokar	e1cb6369b0	Add AVX vector path to float2half/half2float Makes e. g. matrix multiplication 2x faster: name old cpu/op new cpu/op delta BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5) Tested on all possible input values (not adding tests, since they take a long time).	2021-10-28 13:59:01 -04:00
Antonio Sanchez	e559701981	Fix compile issue for gcc 4.8	2021-10-28 08:23:19 -07:00
Rohit Santhanam	48e40b22bf	Preliminary HIP bfloat16 GPU support.	2021-10-27 18:36:45 +00:00
Antonio Sanchez	40bbe8a4d0	Fix ZVector build. Cross-compiled via `s390x-linux-gnu-g++`, run via qemu. This allows the packetmath tests to pass.	2021-10-27 16:30:15 +00:00
Alex Druinsky	6bb6a6bf53	Vectorize fp16 tanh and logistic functions on Neon Activates vectorization of the Eigen::half versions of the tanh and logistic functions when they run on Neon. Both functions convert their inputs to float before computing the output, and as a result of this commit, the conversions and the computation in float are vectorized.	2021-10-27 16:09:16 +00:00
Andreas Krebbel	8faafc3aaa	ZVector: Move alignas qualifier to come first We currently have plenty of type definitions with the alignment qualifier coming after the type. The compiler warns about ignoring them: int EIGEN_ALIGN16 ai[4]; Turn this into: EIGEN_ALIGN16 int ai[4];	2021-10-26 15:33:47 +02:00
Antonio Sanchez	fd5f48e465	Fix tuple compilation for VS2017. VS2017 doesn't like deducing alias types, leading to a bunch of compile errors for functions involving the `tuple` alias. Replacing with `TupleImpl` seems to solve this, allowing the test to compile/pass.	2021-10-20 19:18:34 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Antonio Sanchez	f0f1d7938b	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type.	2021-09-27 15:15:11 -07:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
sciencewhiz	4b6036e276	fix various typos	2021-09-22 16:15:06 +00:00
Alexander Grund	b5eaa42695	Fix alias violation in BFloat16 reinterpret_cast between unrelated types is undefined behavior and leads to misoptimizations on some platforms. Use the safer (and faster) version via bit_cast	2021-09-20 10:37:50 +02:00
Antonio Sanchez	3c724c44cf	Fix strict aliasing bug causing product_small failure. Packet loading is skipped due to aliasing violation, leading to nullopt matrix multiplication. Fixes #2327.	2021-09-17 21:09:34 +00:00
Rasmus Munk Larsen	7b975acb1f	Remove unused variable.	2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen	92849d814b	Remove unused variable.	2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen	da027fa20a	Remove unused variable.	2021-09-16 20:02:42 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	26e5beb8cb	Device-compatible Tuple implementation. An analogue of `std::tuple` that works on device. Context: I've tried `std::tuple` in various versions of NVCC and clang, and although code seems to compile, it often fails to run - generating "illegal memory access" errors, or "illegal instruction" errors. This replacement does work on device.	2021-09-08 13:34:19 -07:00
Antonio Sanchez	7792b1e909	Fix AVX2 PacketMath.h. There were a couple typos ps -> epi32, and an unaligned load issue.	2021-09-03 19:47:57 +00:00
Antonio Sanchez	def145547f	Add missing packet types in pset1 call. Oops, introduced this when "fixing" integer packets.	2021-09-02 16:21:07 -07:00
Antonio Sanchez	3d4ba855e0	Fix AVX integer packet issues. Most are instances of AVX2 functions not protected by `EIGEN_VECTORIZE_AVX2`. There was also a missing semi-colon for AVX512.	2021-09-01 14:14:43 -07:00
Antonio Sanchez	ff07a8a639	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315.	2021-08-31 20:20:47 +00:00
Antonio Sanchez	cc3573ab44	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets.	2021-08-31 19:13:12 +00:00
Jakub Lichman	dc5b1f7d75	AVX512 and AVX2 support for Packet16i and Packet8i added	2021-08-25 19:38:23 +00:00
Han-Kuan Chen	ab28419298	optimize predux if architecture is aarch64	2021-08-25 19:18:54 +00:00
Antonio Sanchez	2cc6ee0d2e	Add missing PPC packet comparisons. This is to fix the packetmath tests on the ppc pipeline.	2021-08-17 07:42:04 -07:00
Chip-Kerchner	8dcf3e38ba	Fix unaligned loads in ploadLhs & ploadRhs for P8.	2021-08-16 20:28:22 -05:00
Chip-Kerchner	e07227c411	Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).	2021-08-13 11:21:28 -05:00
Chip Kerchner	66499f0f17	Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+	2021-08-12 21:38:54 +00:00
ChipKerchner	413bc491f1	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).	2021-08-10 15:03:18 -05:00
Gauri Deshpande	e6a5a594a7	remove denormal flushing in fp32tobf16 for avx & avx512	2021-08-09 22:15:21 +00:00
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
Rasmus Munk Larsen	7b35638ddb	Fix breakage of conj_helper in conjunction with custom types introduced in !537 .	2021-07-02 20:42:15 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Chip Kerchner	91e99ec1e0	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow	2021-06-30 23:05:04 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Rohit Santhanam	2d132d1736	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this.	2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	52a5f98212	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.	2021-06-24 15:47:48 -07:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Chip-Kerchner	ef1fd341a8	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.	2021-06-16 16:30:31 +00:00
Antonio Sanchez	9e94c59570	Add missing ppc pcmp_lt_or_nan<Packet8bf>	2021-06-15 13:42:17 -07:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
guoqiangqi	3d9051ea84	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .	2021-05-10 23:53:16 +00:00

1 2 3 4 5 ...

1082 Commits