eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Andreas Krebbel	8faafc3aaa	ZVector: Move alignas qualifier to come first We currently have plenty of type definitions with the alignment qualifier coming after the type. The compiler warns about ignoring them: int EIGEN_ALIGN16 ai[4]; Turn this into: EIGEN_ALIGN16 int ai[4];	2021-10-26 15:33:47 +02:00
Antonio Sanchez	fd5f48e465	Fix tuple compilation for VS2017. VS2017 doesn't like deducing alias types, leading to a bunch of compile errors for functions involving the `tuple` alias. Replacing with `TupleImpl` seems to solve this, allowing the test to compile/pass.	2021-10-20 19:18:34 +00:00
Antonio Sanchez	d0d34524a1	Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h The `Complex.h` file applies equally to HIP/CUDA, so placing under the generic `GPU` folder. The `TensorReductionCuda.h` has already been deprecated, now removing for the next Eigen version.	2021-10-20 12:00:19 -07:00
Antonio Sanchez	f0f1d7938b	Disable testing of complex compound assignment operators for MSVC. MSVC does not support specializing compound assignments for `std::complex`, since it already specializes them (contrary to the standard). Trying to use one of these on device will currently lead to a duplicate definition error. This is still probably preferable to no error though. If we remove the definitions for MSVC, then it will compile, but the kernel will fail silently. The only proper solution would be to define our own custom `Complex` type.	2021-09-27 15:15:11 -07:00
Kolja Brix	afa616bc9e	Fix some typos found	2021-09-23 15:22:00 +00:00
sciencewhiz	4b6036e276	fix various typos	2021-09-22 16:15:06 +00:00
Alexander Grund	b5eaa42695	Fix alias violation in BFloat16 reinterpret_cast between unrelated types is undefined behavior and leads to misoptimizations on some platforms. Use the safer (and faster) version via bit_cast	2021-09-20 10:37:50 +02:00
Antonio Sanchez	3c724c44cf	Fix strict aliasing bug causing product_small failure. Packet loading is skipped due to aliasing violation, leading to nullopt matrix multiplication. Fixes #2327.	2021-09-17 21:09:34 +00:00
Rasmus Munk Larsen	7b975acb1f	Remove unused variable.	2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen	92849d814b	Remove unused variable.	2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen	da027fa20a	Remove unused variable.	2021-09-16 20:02:42 +00:00
Rasmus Munk Larsen	d7d0bf832d	Issue an error in case of direct inclusion of internal headers.	2021-09-10 19:12:26 +00:00
Antonio Sanchez	26e5beb8cb	Device-compatible Tuple implementation. An analogue of `std::tuple` that works on device. Context: I've tried `std::tuple` in various versions of NVCC and clang, and although code seems to compile, it often fails to run - generating "illegal memory access" errors, or "illegal instruction" errors. This replacement does work on device.	2021-09-08 13:34:19 -07:00
Antonio Sanchez	7792b1e909	Fix AVX2 PacketMath.h. There were a couple typos ps -> epi32, and an unaligned load issue.	2021-09-03 19:47:57 +00:00
Antonio Sanchez	def145547f	Add missing packet types in pset1 call. Oops, introduced this when "fixing" integer packets.	2021-09-02 16:21:07 -07:00
Antonio Sanchez	3d4ba855e0	Fix AVX integer packet issues. Most are instances of AVX2 functions not protected by `EIGEN_VECTORIZE_AVX2`. There was also a missing semi-colon for AVX512.	2021-09-01 14:14:43 -07:00
Antonio Sanchez	ff07a8a639	GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315 ). GCC 4.8 doesn't seem to like the `g` register constraint, failing to compile with "error: 'asm' operand requires impossible reload". Tested `r` instead, and that seems to work, even with latest compilers. Also fixed some minor macro issues to eliminate warnings on armv7. Fixes #2315.	2021-08-31 20:20:47 +00:00
Antonio Sanchez	cc3573ab44	Disable cuda Eigen::half vectorization on host. All cuda `__half` functions are device-only in CUDA 9, including conversions. Host-side conversions were added in CUDA 10. The existing code doesn't build prior to 10.0. All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all. Modified the code to disable vectorization for `__half` on host, which required also updating the `TensorReductionGpu` implementation which previously made assumptions about available packets.	2021-08-31 19:13:12 +00:00
Jakub Lichman	dc5b1f7d75	AVX512 and AVX2 support for Packet16i and Packet8i added	2021-08-25 19:38:23 +00:00
Han-Kuan Chen	ab28419298	optimize predux if architecture is aarch64	2021-08-25 19:18:54 +00:00
Antonio Sanchez	2cc6ee0d2e	Add missing PPC packet comparisons. This is to fix the packetmath tests on the ppc pipeline.	2021-08-17 07:42:04 -07:00
Chip-Kerchner	8dcf3e38ba	Fix unaligned loads in ploadLhs & ploadRhs for P8.	2021-08-16 20:28:22 -05:00
Chip-Kerchner	e07227c411	Reverse compare logic in F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8).	2021-08-13 11:21:28 -05:00
Chip Kerchner	66499f0f17	Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+	2021-08-12 21:38:54 +00:00
ChipKerchner	413bc491f1	Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl).	2021-08-10 15:03:18 -05:00
Gauri Deshpande	e6a5a594a7	remove denormal flushing in fp32tobf16 for avx & avx512	2021-08-09 22:15:21 +00:00
derekjchow	66ca41bd47	Add support for vectorizing logical comparisons.	2021-07-23 20:07:48 +00:00
Rasmus Munk Larsen	7b35638ddb	Fix breakage of conj_helper in conjunction with custom types introduced in !537 .	2021-07-02 20:42:15 +00:00
Rasmus Munk Larsen	bbfc4d54cd	Use `padd` instead of `+`.	2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen	9312a5bf5c	Implement a generic vectorized version of Smith's algorithms for complex division.	2021-07-01 23:31:12 +00:00
Chip Kerchner	91e99ec1e0	Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow	2021-06-30 23:05:04 +00:00
大河メタル	c81da59a25	Correct declarations for aarch64-pc-windows-msvc	2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen	5aebbe9098	Get rid of redundant `pabs` instruction in complex square root.	2021-06-29 23:26:15 +00:00
Rohit Santhanam	2d132d1736	Commit `52a5f982` broke conjhelper functionality for HIP GPUs. This commit addresses this.	2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen	bffd267d17	Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code.	2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen	52a5f98212	Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations.	2021-06-24 15:47:48 -07:00
Antonio Sanchez	12e8d57108	Remove pset, replace with ploadu. We can't make guarantees on alignment for existing calls to `pset`, so we should default to loading unaligned. But in that case, we should just use `ploadu` directly. For loading constants, this load should hopefully get optimized away. This is causing segfaults in Google Maps.	2021-06-16 18:41:17 -07:00
Chip-Kerchner	ef1fd341a8	EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate.	2021-06-16 16:30:31 +00:00
Antonio Sanchez	9e94c59570	Add missing ppc pcmp_lt_or_nan<Packet8bf>	2021-06-15 13:42:17 -07:00
Rasmus Munk Larsen	fc87e2cbaa	Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled.	2021-06-11 02:35:53 +00:00
Antonio Sanchez	dba753a986	Add missing NEON ptranspose implementations. Unified implementation using only `vzip`.	2021-05-25 18:25:35 +00:00
guoqiangqi	3d9051ea84	Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 .	2021-05-10 23:53:16 +00:00
Christoph Hertzberg	722ca0b665	Revert addition of unused `paddsub<Packet2cf>`. This fixes #2242	2021-05-06 18:36:47 +02:00
Antonio Sanchez	1c013be2cc	Better CUDA complex division. The original produced NaNs when dividing 0/b for subnormal b. The `complex_divide_stable` was changed to use the more common Smith's algorithm.	2021-04-29 17:39:58 +00:00
Antonio Sanchez	172db7bfc3	Add missing pcmp_lt_or_nan for NEON Packet4bf.	2021-04-27 14:12:11 -07:00
Jakub Lichman	d87648a6be	Tests added and AVX512 bug fixed for pcmp_lt_or_nan	2021-04-25 20:58:56 +00:00
Chip-Kerchner	06c2760bd1	Fix taking address of rvalue compiler issue with TensorFlow (plus other warnings).	2021-04-21 00:47:13 +00:00
Jakub Lichman	2b1dfd1ba0	HasExp added for AVX512 Packet8d	2021-04-20 19:07:58 +00:00
Antonio Sanchez	1d79c68ba0	Fix ldexp for AVX512 (#2215 ) Wrong shuffle was used. Need to interleave low/high halves with a `permute` instruction. Fixes #2215.	2021-04-20 16:25:22 +00:00
Christoph Hertzberg	9357feedc7	Avoid using uninitialized inputs and if available, use slightly more efficient `movsd` instruction for `pset1<Packet2cf>`.	2021-04-13 01:36:59 +02:00

1 2 3 4 5 ...

1074 Commits