Commit Graph

1082 Commits

Author SHA1 Message Date
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen
5137a5157a Make numeric_limits members constexpr as per the newer C++ standards.
Author: majnemer@google.com
2021-11-19 15:58:36 +00:00
Chip Kerchner
9cf34ee0ae Invert rows and depth in non-vectorized portion of packing (PowerPC). 2021-10-28 21:59:41 +00:00
Ilya Tokar
e1cb6369b0 Add AVX vector path to float2half/half2float
Makes e. g. matrix multiplication 2x faster:
name         old cpu/op  new cpu/op  delta
BM_convers   181ms ± 1%    62ms ± 9%  -65.82%  (p=0.016 n=4+5)

Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
Antonio Sanchez
e559701981 Fix compile issue for gcc 4.8 2021-10-28 08:23:19 -07:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00
Antonio Sanchez
40bbe8a4d0 Fix ZVector build.
Cross-compiled via `s390x-linux-gnu-g++`, run via qemu.  This allows the
packetmath tests to pass.
2021-10-27 16:30:15 +00:00
Alex Druinsky
6bb6a6bf53 Vectorize fp16 tanh and logistic functions on Neon
Activates vectorization of the Eigen::half versions of the tanh and
logistic functions when they run on Neon. Both functions convert their
inputs to float before computing the output, and as a result of this
commit, the conversions and the computation in float are vectorized.
2021-10-27 16:09:16 +00:00
Andreas Krebbel
8faafc3aaa ZVector: Move alignas qualifier to come first
We currently have plenty of type definitions with the alignment
qualifier coming after the type.  The compiler warns about ignoring
them:
int EIGEN_ALIGN16 ai[4];

Turn this into:
EIGEN_ALIGN16 int ai[4];
2021-10-26 15:33:47 +02:00
Antonio Sanchez
fd5f48e465 Fix tuple compilation for VS2017.
VS2017 doesn't like deducing alias types, leading to a bunch of compile
errors for functions involving the `tuple` alias.  Replacing with
`TupleImpl` seems to solve this, allowing the test to compile/pass.
2021-10-20 19:18:34 +00:00
Antonio Sanchez
d0d34524a1 Move CUDA/Complex.h to GPU/Complex.h, remove TensorReductionCuda.h
The `Complex.h` file applies equally to HIP/CUDA, so placing under the
generic `GPU` folder.

The `TensorReductionCuda.h` has already been deprecated, now removing
for the next Eigen version.
2021-10-20 12:00:19 -07:00
Antonio Sanchez
f0f1d7938b Disable testing of complex compound assignment operators for MSVC.
MSVC does not support specializing compound assignments for
`std::complex`, since it already specializes them (contrary to the
standard).

Trying to use one of these on device will currently lead to a
duplicate definition error.  This is still probably preferable
to no error though.  If we remove the definitions for MSVC, then
it will compile, but the kernel will fail silently.

The only proper solution would be to define our own custom `Complex`
type.
2021-09-27 15:15:11 -07:00
Kolja Brix
afa616bc9e Fix some typos found 2021-09-23 15:22:00 +00:00
sciencewhiz
4b6036e276 fix various typos 2021-09-22 16:15:06 +00:00
Alexander Grund
b5eaa42695 Fix alias violation in BFloat16
reinterpret_cast between unrelated types is undefined behavior and leads
to misoptimizations on some platforms.
Use the safer (and faster) version via bit_cast
2021-09-20 10:37:50 +02:00
Antonio Sanchez
3c724c44cf Fix strict aliasing bug causing product_small failure.
Packet loading is skipped due to aliasing violation, leading to nullopt matrix
multiplication.

Fixes #2327.
2021-09-17 21:09:34 +00:00
Rasmus Munk Larsen
7b975acb1f Remove unused variable. 2021-09-16 20:27:13 +00:00
Rasmus Munk Larsen
92849d814b Remove unused variable. 2021-09-16 20:21:31 +00:00
Rasmus Munk Larsen
da027fa20a Remove unused variable. 2021-09-16 20:02:42 +00:00
Rasmus Munk Larsen
d7d0bf832d Issue an error in case of direct inclusion of internal headers. 2021-09-10 19:12:26 +00:00
Antonio Sanchez
26e5beb8cb Device-compatible Tuple implementation.
An analogue of `std::tuple` that works on device.

Context: I've tried `std::tuple` in various versions of NVCC and clang,
and although code seems to compile, it often fails to run - generating
"illegal memory access" errors, or "illegal instruction" errors.
This replacement does work on device.
2021-09-08 13:34:19 -07:00
Antonio Sanchez
7792b1e909 Fix AVX2 PacketMath.h.
There were a couple typos ps -> epi32, and an unaligned load issue.
2021-09-03 19:47:57 +00:00
Antonio Sanchez
def145547f Add missing packet types in pset1 call.
Oops, introduced this when "fixing" integer packets.
2021-09-02 16:21:07 -07:00
Antonio Sanchez
3d4ba855e0 Fix AVX integer packet issues.
Most are instances of AVX2 functions not protected by
`EIGEN_VECTORIZE_AVX2`.  There was also a missing semi-colon
for AVX512.
2021-09-01 14:14:43 -07:00
Antonio Sanchez
ff07a8a639 GCC 4.8 arm EIGEN_OPTIMIZATION_BARRIER fix (#2315).
GCC 4.8 doesn't seem to like the `g` register constraint, failing to
compile with "error: 'asm' operand requires impossible reload".

Tested `r` instead, and that seems to work, even with latest compilers.

Also fixed some minor macro issues to eliminate warnings on armv7.

Fixes #2315.
2021-08-31 20:20:47 +00:00
Antonio Sanchez
cc3573ab44 Disable cuda Eigen::half vectorization on host.
All cuda `__half` functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.

All arithmetic functions are always device-only, so there's
therefore no reason to use vectorization on the host at all.

Modified the code to disable vectorization for `__half` on host,
which required also updating the `TensorReductionGpu` implementation
which previously made assumptions about available packets.
2021-08-31 19:13:12 +00:00
Jakub Lichman
dc5b1f7d75 AVX512 and AVX2 support for Packet16i and Packet8i added 2021-08-25 19:38:23 +00:00
Han-Kuan Chen
ab28419298 optimize predux if architecture is aarch64 2021-08-25 19:18:54 +00:00
Antonio Sanchez
2cc6ee0d2e Add missing PPC packet comparisons.
This is to fix the packetmath tests on the ppc pipeline.
2021-08-17 07:42:04 -07:00
Chip-Kerchner
8dcf3e38ba Fix unaligned loads in ploadLhs & ploadRhs for P8. 2021-08-16 20:28:22 -05:00
Chip-Kerchner
e07227c411 Reverse compare logic ƒin F32ToBf16 since vec_cmpne is not available in Power8 - now compiles for clang10 default (P8). 2021-08-13 11:21:28 -05:00
Chip Kerchner
66499f0f17 Get rid of used uninitialized warnings for EIGEN_UNUSED_VARIABLE in gcc11+ 2021-08-12 21:38:54 +00:00
ChipKerchner
413bc491f1 Fix errors on older compilers (gcc 7.5 - lack of vec_neg, clang10 - can not use const pointers with vec_xl). 2021-08-10 15:03:18 -05:00
Gauri Deshpande
e6a5a594a7 remove denormal flushing in fp32tobf16 for avx & avx512 2021-08-09 22:15:21 +00:00
derekjchow
66ca41bd47 Add support for vectorizing logical comparisons. 2021-07-23 20:07:48 +00:00
Rasmus Munk Larsen
7b35638ddb Fix breakage of conj_helper in conjunction with custom types introduced in !537. 2021-07-02 20:42:15 +00:00
Rasmus Munk Larsen
bbfc4d54cd Use padd instead of +. 2021-07-02 02:51:48 +00:00
Rasmus Munk Larsen
9312a5bf5c Implement a generic vectorized version of Smith's algorithms for complex division. 2021-07-01 23:31:12 +00:00
Chip Kerchner
91e99ec1e0 Create the ability to disable the specialized gemm_pack_rhs in Eigen (only PPC) for TensorFlow 2021-06-30 23:05:04 +00:00
大河メタル
c81da59a25 Correct declarations for aarch64-pc-windows-msvc 2021-06-30 04:09:46 +00:00
Rasmus Munk Larsen
5aebbe9098 Get rid of redundant pabs instruction in complex square root. 2021-06-29 23:26:15 +00:00
Rohit Santhanam
2d132d1736 Commit 52a5f982 broke conjhelper functionality for HIP GPUs.
This commit addresses this.
2021-06-25 19:28:00 +00:00
Rasmus Munk Larsen
bffd267d17 Small cleanup: Get rid of the macros EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD and CJMADD, which were effectively unused, apart from on x86, where the change results in identically performing code. 2021-06-24 18:52:17 -07:00
Rasmus Munk Larsen
52a5f98212 Get rid of code duplication for conj_helper. For packets where LhsType=RhsType a single generic implementation suffices. For scalars, the generic implementation of pconj automatically forwards to numext::conj, so much of the existing specialization can be avoided. For mixed types we still need specializations. 2021-06-24 15:47:48 -07:00
Antonio Sanchez
12e8d57108 Remove pset, replace with ploadu.
We can't make guarantees on alignment for existing calls to `pset`,
so we should default to loading unaligned.  But in that case, we should
just use `ploadu` directly. For loading constants, this load should hopefully
get optimized away.

This is causing segfaults in Google Maps.
2021-06-16 18:41:17 -07:00
Chip-Kerchner
ef1fd341a8 EIGEN_STRONG_INLINE was NOT inlining in some critical needed areas (6.6X slowdown) when used with Tensorflow. Changing to EIGEN_ALWAYS_INLINE where appropiate. 2021-06-16 16:30:31 +00:00
Antonio Sanchez
9e94c59570 Add missing ppc pcmp_lt_or_nan<Packet8bf> 2021-06-15 13:42:17 -07:00
Rasmus Munk Larsen
fc87e2cbaa Use bit_cast to create -0.0 for floating point types to avoid compiler optimization changing sign with --ffast-math enabled. 2021-06-11 02:35:53 +00:00
Antonio Sanchez
dba753a986 Add missing NEON ptranspose implementations.
Unified implementation using only `vzip`.
2021-05-25 18:25:35 +00:00
guoqiangqi
3d9051ea84 Changing the storage of the SSE complex packets to that of the wrapper. This should fix #2242 . 2021-05-10 23:53:16 +00:00