Commit Graph

1126 Commits

Author SHA1 Message Date
Chip Kerchner
0699fa06fe Split general_matrix_vector_product interface for Power into two macros - one ColMajor and RowMajor. 2022-03-23 18:09:33 +00:00
Antonio Sánchez
4451823fb4 Fix ODR violation in trsm. 2022-03-20 15:56:53 +00:00
Antonio Sánchez
9a14d91a99 Fix AVX512 builds with MSVC. 2022-03-18 16:04:53 +00:00
Chip Kerchner
7b10795e39 Change EIGEN_ALTIVEC_ENABLE_MMA_DYNAMIC_DISPATCH and EIGEN_ALTIVEC_DISABLE_MMA flags to be like TensorFlow's... 2022-03-17 22:35:27 +00:00
Antonio Sanchez
e34db1239d Fix missing pound 2022-03-16 12:26:12 -07:00
Antonio Sánchez
591906477b Fix up PowerPC MMA flags so it builds by default. 2022-03-16 19:16:28 +00:00
b-shi
518fc321cb AVX512 Optimizations for Triangular Solve 2022-03-16 18:04:50 +00:00
Erik Schultheis
421cbf0866 Replace Eigen type metaprogramming with corresponding std types and make use of alias templates 2022-03-16 16:43:40 +00:00
Rasmus Munk Larsen
9ad5661482 Revert "Fix up PowerPC MMA flags so it builds by default." 2022-03-15 20:51:03 +00:00
Antonio Sánchez
65eeedf964 Fix up PowerPC MMA flags so it builds by default. 2022-03-15 20:22:23 +00:00
Tobias Schlüter
cb1e8228e9 Convert bit calculation to constexpr, avoid casts. 2022-03-13 22:38:36 +09:00
Duncan McBain
a3b64625e3 Remove ComputeCpp-specific code from SYCL Vptr 2022-03-08 22:44:18 +00:00
Rasmus Munk Larsen
0e6f4e43f1 Fix a few confusing comments in psincos_float. 2022-03-04 20:41:49 +00:00
Sean McBride
f1b9692d63 Removed EIGEN_UNUSED decorations from many functions that are in fact used 2022-03-03 20:19:33 +00:00
Antonio Sánchez
9c07e201ff Modified sqrt/rsqrt for denormal handling. 2022-03-02 17:20:47 +00:00
Antonio Sánchez
19c39bea29 Fix mixingtypes for g++-11. 2022-02-25 19:28:10 +00:00
Rasmus Munk Larsen
8b875dbef1 Changes to fast SQRT/RSQRT 2022-02-23 17:32:21 +00:00
Ramil Sattarov
f9b7564faa E2K: initial support of LCC MCST compiler for the Elbrus 2000 CPU architecture 2022-02-23 17:07:34 +00:00
Antonio Sánchez
28e008b99a Fix sqrt/rsqrt for NEON. 2022-02-15 21:31:51 +00:00
Erik Schultheis
7197b577fb Remove unused macros in AVX packetmath.
The following macros are removed:

* EIGEN_DECLARE_CONST_Packet8f
* EIGEN_DECLARE_CONST_Packet4d
* EIGEN_DECLARE_CONST_Packet8f_FROM_INT
* EIGEN_DECLARE_CONST_Packet8i
2022-02-14 10:34:23 +00:00
Chip Kerchner
cb5ca1c901 Cleanup compiler warnings, etc from recent changes in GEMM & GEMV for PowerPC 2022-02-09 18:47:08 +00:00
Rasmus Munk Larsen
92d0026b7b Provide a definition for numeric_limits static data members 2022-02-08 20:34:53 +00:00
Rasmus Munk Larsen
979fdd58a4 Add generic fast psqrt and prsqrt impls and make them correct for 0, +Inf, NaN, and negative arguments. 2022-02-05 00:20:13 +00:00
Antonio Sánchez
4bffbe84f9 Restrict GCC<6.3 maxpd workaround to only gcc. 2022-02-04 22:47:34 +00:00
Antonio Sánchez
e7f4a901ee Define EIGEN_HAS_AVX512_MATH in PacketMath. 2022-02-04 22:25:52 +00:00
Antonio Sánchez
6b60bd6754 Fix 32-bit arm int issue. 2022-02-04 21:59:33 +00:00
Antonio Sánchez
96da541cba Fix AVX512 math function consistency, enable for ICC. 2022-02-04 19:35:18 +00:00
Antonio Sánchez
cafeadffef Fix ODR violations. 2022-02-04 19:01:07 +00:00
Chip Kerchner
66464bd2a8 Fix number of block columns to NOT overflow the cache (PowerPC) abnormally in GEMV 2022-01-27 20:35:53 +00:00
Rasmus Munk Larsen
8f2c6f0aa6 Make preciprocal IEEE compliant w.r.t. 1/0 and 1/inf. 2022-01-26 20:38:05 +00:00
Rasmus Munk Larsen
51311ec651 Remove inline assembly for FMA (AVX) and add remaining extensions as packet ops: pmsub, pnmadd, and pnmsub. 2022-01-26 04:25:41 +00:00
Rasmus Munk Larsen
ea2c02060c Add reciprocal packet op and fast specializations for float with SSE, AVX, and AVX512. 2022-01-21 23:49:18 +00:00
Ilya Tokar
a0fc640c18 Add support for packets of int64 on x86 2022-01-21 19:55:23 +00:00
Erik Schultheis
970640519b Cleanup 2022-01-21 01:48:59 +00:00
Chip Kerchner
708fd6d136 Add MMA and performance improvements for VSX in GEMV for PowerPC. 2022-01-13 13:23:18 +00:00
Kolja Brix
8d81a2339c Reduce usage of reserved names 2022-01-10 20:53:29 +00:00
Matthias Möller
c4b1dd2f6b Add support for Cray, Fujitsu, and Intel ICX compilers
The following preprocessor macros are added:

- EIGEN_COMP_CPE and EIGEN_COMP_CLANGCPE version number of the CRAY compiler if
  Eigen is compiled with the Cray C++ compiler, 0 otherwise.

- EIGEN_COMP_FCC and EIGEN_COMP_CLANGFCC version number of the FCC compiler if
  Eigen is compiled with the Fujitsu C++ compiler, 0 otherwise

- EIGEN_COMP_CLANGICC version number of the ICX compiler if Eigen is compiled
  with the Intel oneAPI C++ compiler, 0 otherwise

All three compilers (Cray, Fujitsu, Intel) offer a traditional and a Clang-based
frontend. This is distinguished by the CLANG prefix.
2022-01-07 18:46:16 +00:00
Rasmus Munk Larsen
96dc37a03b Some fixes/cleanups for numeric_limits & fix for related bug in psqrt 2022-01-07 01:10:17 +00:00
Rasmus Munk Larsen
7b5a8b6bc5 Improve plog: 20% speedup for float + handle denormals 2022-01-05 23:40:31 +00:00
Shiva Ghose
a4098ac676 Fix duplicate include guard *ALTIVEC_H -> *ZVECTOR_H
Some some header guards were repeated between the `AltiVec` package and the
`ZVector` packages. This could cause a problem if (for whatever reason) someone
attempts to include headers for both architectures.
2021-12-31 08:43:24 +00:00
Rasmus Munk Larsen
8eab7b6886 Improve exp<float>(): Don't flush denormal results +4% speedup.
1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to
degree 6. With exactly representable coefficients computed by the Sollya tool,
this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for
arguments where exp(x) is a normalized float. This change results in a speedup
of about 4% for AVX2.


2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to
~[-104;88] i.e. return denormalized values for large negative arguments instead
of zero. Compared to exp<double>(x) the denormalized results gradually decrease
in accuracy down to 0.033 relative error for arguments around x = -104 where
exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.
2021-12-28 15:00:19 +00:00
Erik Schultheis
dee6428a71 fixed clang warnings about alignment change and floating point precision 2021-12-18 17:18:16 +00:00
Rasmus Munk Larsen
f04fd8b168 Make sure exp(-Inf) is zero for vectorized expressions. This fixes #2385. 2021-12-08 17:57:23 +00:00
Erik Schultheis
cc11e240ac Some further cleanup 2021-12-06 18:01:15 +00:00
Erik Schultheis
ec2fd0f7ed Require recent GCC and MSCV and removed EIGEN_HAS_CXX14 and some other feature test macros 2021-12-01 00:48:34 +00:00
Rasmus Munk Larsen
5137a5157a Make numeric_limits members constexpr as per the newer C++ standards.
Author: majnemer@google.com
2021-11-19 15:58:36 +00:00
Chip Kerchner
9cf34ee0ae Invert rows and depth in non-vectorized portion of packing (PowerPC). 2021-10-28 21:59:41 +00:00
Ilya Tokar
e1cb6369b0 Add AVX vector path to float2half/half2float
Makes e. g. matrix multiplication 2x faster:
name         old cpu/op  new cpu/op  delta
BM_convers   181ms ± 1%    62ms ± 9%  -65.82%  (p=0.016 n=4+5)

Tested on all possible input values (not adding tests, since they
take a long time).
2021-10-28 13:59:01 -04:00
Antonio Sanchez
e559701981 Fix compile issue for gcc 4.8 2021-10-28 08:23:19 -07:00
Rohit Santhanam
48e40b22bf Preliminary HIP bfloat16 GPU support. 2021-10-27 18:36:45 +00:00