Rasmus Munk Larsen
|
1a09defce7
|
Protect new pblend implementation with EIGEN_VECTORIZE_AVX2
|
2022-08-22 18:28:03 +00:00 |
|
Rasmus Munk Larsen
|
7c67dc67ae
|
Use proper double word division algorithm for pow<double>. Gives 11-15% speedup.
|
2022-08-17 18:36:23 +00:00 |
|
Matthew Sterrett
|
7a3b667c43
|
Add support for AVX512-FP16 for vectorizing half precision math
|
2022-08-17 18:15:21 +00:00 |
|
Charles Schlosser
|
76a669fb45
|
add fixed power unary operation
|
2022-08-16 21:32:36 +00:00 |
|
Matthew Sterrett
|
39fcc89798
|
Removed unnecessary checks for FP16C
|
2022-08-16 18:14:41 +00:00 |
|
Romain Biessy
|
2f7cce2dd5
|
[SYCL] Fix some SYCL tests
|
2022-08-16 17:37:54 +00:00 |
|
Lexi Bromfield
|
66ea0c09fd
|
Don't double-define Half functions on aarch64
|
2022-08-09 20:00:34 +00:00 |
|
Rasmus Munk Larsen
|
97e0784dc6
|
Vectorize the sign operator in Eigen.
|
2022-08-09 19:54:57 +00:00 |
|
Rasmus Munk Larsen
|
7a87ed1b6a
|
Fix code and unit test for a few corner cases in vectorized pow()
|
2022-08-08 18:48:36 +00:00 |
|
Chip Kerchner
|
9e0afe0f02
|
Fix non-VSX PowerPC build
|
2022-08-08 18:18:17 +00:00 |
|
Chip Kerchner
|
84a9d6fac9
|
Fix use of Packet2d type for non-VSX.
|
2022-08-03 20:48:13 +00:00 |
|
Chip Kerchner
|
ce60a7be83
|
Partial Packet support for GEMM real-only (PowerPC). Also fix compilation warnings & errors for some conditions in new API.
|
2022-08-03 18:15:19 +00:00 |
|
Ilya Tokar
|
e618c4a5e9
|
Improve pblend AVX implementation
|
2022-07-29 18:45:33 +00:00 |
|
Alexander Richardson
|
b7668c0371
|
Avoid including <sstream> with EIGEN_NO_IO
|
2022-07-29 18:02:51 +00:00 |
|
Antonio Sánchez
|
2cf4d18c9c
|
Disable AVX512 GEMM kernels by default.
|
2022-07-20 21:22:48 +00:00 |
|
b-shi
|
4a56359406
|
Add option to disable avx512 GEBP kernels
|
2022-07-18 17:59:09 +00:00 |
|
Chip Kerchner
|
84cf3ff18d
|
Add pload_partial, pstore_partial (and unaligned versions), pgather_partial, pscatter_partial, loadPacketPartial and storePacketPartial.
|
2022-06-27 19:18:00 +00:00 |
|
Chip Kerchner
|
c603275dc9
|
Better performance for Power10 using more load and store vector pairs for GEMV
|
2022-06-27 18:11:55 +00:00 |
|
b-shi
|
37673ca1bc
|
AVX512 TRSM kernels use alloca if EIGEN_NO_MALLOC requested
|
2022-06-17 18:05:26 +00:00 |
|
Chip Kerchner
|
4d1c16eab8
|
Fix tanh and erf to use vectorized version for EIGEN_FAST_MATH in VSX.
|
2022-06-15 16:06:43 +00:00 |
|
Shi, Brian
|
28812d2ebb
|
AVX512 TRSM Kernels respect EIGEN_NO_MALLOC
|
2022-06-07 11:28:42 -07:00 |
|
aaraujom
|
8fbb76a043
|
Fix build issues with MSVC for AVX512
|
2022-06-03 14:55:40 +00:00 |
|
aaraujom
|
d49ede4dc4
|
Add AVX512 s/dgemm optimizations for compute kernel (2nd try)
|
2022-05-28 02:00:21 +00:00 |
|
Chip Kerchner
|
aa8b7e2c37
|
Add subMappers to Power GEMM packing - simplifies the address calculations (10% faster)
|
2022-05-23 15:18:29 +00:00 |
|
Guoqiang QI
|
32a3f9ac33
|
Improve plogical_shift_* implementations and fix typo in SVE/PacketMath.h
|
2022-05-23 09:33:49 +00:00 |
|
Eisuke Kawashima
|
ac5c83a3f5
|
unset executable flag
|
2022-05-22 22:47:43 +09:00 |
|
Antonio Sánchez
|
9b9496ad98
|
Revert "Add AVX512 optimizations for matrix multiply"
This reverts commit 25db0b4a82
|
2022-05-13 18:50:33 +00:00 |
|
aaraujom
|
25db0b4a82
|
Add AVX512 optimizations for matrix multiply
|
2022-05-12 23:41:19 +00:00 |
|
Chip Kerchner
|
c2f15edc43
|
Add load vector_pairs for RHS of GEMM MMA. Improved predux GEMV.
|
2022-04-25 16:23:01 +00:00 |
|
Chip Kerchner
|
44ba7a0da3
|
Fix compiler bugs for GCC 10 & 11 for Power GEMM
|
2022-04-20 15:59:00 +00:00 |
|
Chip Kerchner
|
b02c384ef4
|
Add fused multiply functions for PowerPC - pmsub, pnmadd and pnmsub
|
2022-04-18 16:16:32 +00:00 |
|
Shi, Brian
|
fc1d888415
|
Remove AVX512VL dependency in trsm
|
2022-04-14 12:44:24 -07:00 |
|
Antonio Sánchez
|
07db964bde
|
Restrict new AVX512 trsm to AVX512VL, rename files for consistency.
|
2022-04-14 16:58:32 +00:00 |
|
Chip Kerchner
|
53eec53d2a
|
Fix Power GEMV order of operations in predux for MMA.
|
2022-04-11 21:29:05 +00:00 |
|
Tobias Schlüter
|
f3ba220c5d
|
Remove EIGEN_EMPTY_STRUCT_CTOR
|
2022-04-08 18:27:26 +00:00 |
|
Chip Kerchner
|
403fa33409
|
Performance improvements in GEMM for Power
|
2022-04-05 12:18:53 +00:00 |
|
Antonio Sánchez
|
73b2c13bf2
|
Disable f16c scalar conversions for MSVC.
|
2022-03-30 18:35:32 +00:00 |
|
b-shi
|
0611f7fff0
|
Add missing explicit reinterprets
|
2022-03-23 21:10:26 +00:00 |
|
Chip Kerchner
|
0699fa06fe
|
Split general_matrix_vector_product interface for Power into two macros - one ColMajor and RowMajor.
|
2022-03-23 18:09:33 +00:00 |
|
Antonio Sánchez
|
4451823fb4
|
Fix ODR violation in trsm.
|
2022-03-20 15:56:53 +00:00 |
|
Antonio Sánchez
|
9a14d91a99
|
Fix AVX512 builds with MSVC.
|
2022-03-18 16:04:53 +00:00 |
|
Chip Kerchner
|
7b10795e39
|
Change EIGEN_ALTIVEC_ENABLE_MMA_DYNAMIC_DISPATCH and EIGEN_ALTIVEC_DISABLE_MMA flags to be like TensorFlow's...
|
2022-03-17 22:35:27 +00:00 |
|
Antonio Sanchez
|
e34db1239d
|
Fix missing pound
|
2022-03-16 12:26:12 -07:00 |
|
Antonio Sánchez
|
591906477b
|
Fix up PowerPC MMA flags so it builds by default.
|
2022-03-16 19:16:28 +00:00 |
|
b-shi
|
518fc321cb
|
AVX512 Optimizations for Triangular Solve
|
2022-03-16 18:04:50 +00:00 |
|
Erik Schultheis
|
421cbf0866
|
Replace Eigen type metaprogramming with corresponding std types and make use of alias templates
|
2022-03-16 16:43:40 +00:00 |
|
Rasmus Munk Larsen
|
9ad5661482
|
Revert "Fix up PowerPC MMA flags so it builds by default."
|
2022-03-15 20:51:03 +00:00 |
|
Antonio Sánchez
|
65eeedf964
|
Fix up PowerPC MMA flags so it builds by default.
|
2022-03-15 20:22:23 +00:00 |
|
Tobias Schlüter
|
cb1e8228e9
|
Convert bit calculation to constexpr, avoid casts.
|
2022-03-13 22:38:36 +09:00 |
|
Duncan McBain
|
a3b64625e3
|
Remove ComputeCpp-specific code from SYCL Vptr
|
2022-03-08 22:44:18 +00:00 |
|