Commit Graph

  • 45a4aad572 add unit tests for ploadquad and predux4, and split packetmath unit test wrt real/complex Gael Guennebaud 2014-04-17 16:27:22 +02:00
  • e1d461352e Extend mixingtype unit test to check transposed cases. Gael Guennebaud 2014-04-17 16:26:35 +02:00
  • 11fbdcbc38 Fix and optimize mixed products Gael Guennebaud 2014-04-17 16:04:30 +02:00
  • 0fa8290366 Optimize ploaddup for AVX Gael Guennebaud 2014-04-17 16:02:27 +02:00
  • d936ddc3d1 Fallback to lazy products for very small ones. Gael Guennebaud 2014-04-16 23:15:42 +02:00
  • de8336a9bc Enable alloca on MAC OSX Gael Guennebaud 2014-04-16 23:14:58 +02:00
  • ffc995c9e4 Implement evaluator<ReturnByValue>. All supported tests pass apart from Sparse and Geometry, except test in adjoint_4 that a = a.transpose() raises an assert. Jitse Niesen 2014-04-16 18:16:36 +01:00
  • d5a795f673 New gebp kernel handling up to 3 packets x 4 register-level blocks. Huge speeup on Haswell. This changeset also introduce new vector functions: ploadquad and predux4. Gael Guennebaud 2014-04-16 17:05:11 +02:00
  • b30706bd5c Fix typo in Inverse.h Jitse Niesen 2014-04-15 22:51:46 +01:00
  • e0dbb68c2f Check IMKL version for compatibility with Eigen Mark Borgerding 2014-04-15 13:57:03 -04:00
  • 59f5f155c2 Port products with permutation matrices to evaluators. Jitse Niesen 2014-04-15 15:21:38 +01:00
  • 20c840be15 Merged in benoitsteiner/eigen-fixes/nvcc_fixes (pull request PR-56) Gael Guennebaud 2014-04-15 10:38:25 +02:00
  • 1afd50e0f3 Fixed a typo in CXX11Meta.h Benoit Steiner 2014-04-14 14:26:30 -07:00
  • 3c66bb136b bug #793: detect NaN and INF in EigenSolver instead of aborting with an assert. Gael Guennebaud 2014-04-14 22:00:27 +02:00
  • 7098e6d976 Add isfinite overload for complexes. Gael Guennebaud 2014-04-14 21:57:49 +02:00
  • feaf7c7e6d Optimized SSE unaligned loads and stores when compiling a 64bit target with a recent version of gcc (ie gcc 4.8). Benoit Steiner 2014-04-14 10:44:17 -07:00
  • d567e3b893 Merged in benoitsteiner/eigen-fixes (pull request PR-55) Gael Guennebaud 2014-04-14 14:33:50 +02:00
  • 148acf8e4f bug #790: fix overflow in real_2x2_jacobi_svd Gael Guennebaud 2014-04-14 13:52:16 +02:00
  • 0587db8bf5 bug #793: fix overflow in EigenSolver and add respective regression unit test Gael Guennebaud 2014-04-14 11:43:08 +02:00
  • 7903d3f27b Updated the compiler flags to enable nvcc to work with clang. Benoit Steiner 2014-04-12 23:39:37 -07:00
  • a803ff18a9 Fixed a typo in cuda_basic.cu Benoit Steiner 2014-04-12 20:24:05 -07:00
  • 91288e9bf9 Add include LevenbergMarquardt in CMakeLists.txt. Freddie Witherden 2014-04-12 12:53:09 +01:00
  • fbd5eac7cf Merged in benoitsteiner/eigen-fixes/nvcc_fixes (pull request PR-53) Jitse Niesen 2014-04-11 14:16:08 +01:00
  • 1b333c89c9 Updated my previous fix to avoid introducing a compilation warning on ARM platforms. Benoit Steiner 2014-04-10 17:43:13 -07:00
  • a1fcf599fa Silenced a compilation warning produced by nvcc. Benoit Steiner 2014-04-10 11:19:37 -07:00
  • a91a7a1964 doc: Add references to Cholesky methods in SelfAdjointView. Jitse Niesen 2014-04-07 14:14:48 +01:00
  • 3b2321e3ab Updated the geo_parametrizedline_2 test for AVX. Benoit Steiner 2014-04-04 17:08:47 -07:00
  • b446ff037e Deleted some dead code. Benoit Steiner 2014-04-04 14:12:24 -07:00
  • 5afcb4965c Remove out-dated comment in cholesky test. Jitse Niesen 2014-04-04 16:48:13 +01:00
  • 096af59799 Fix bug #784: Assert if assigning a product to a triangularView does not match the size. Christoph Hertzberg 2014-04-04 17:48:37 +02:00
  • 8044b00a7f bug #782: Workaround for gcc <= 4.4 compilation error on the NEON PacketMath code. Benoit Steiner 2014-04-03 23:41:47 +02:00
  • aecc78325a Pulled the latest updates from the eigen trunk. Benoit Steiner 2014-04-01 22:07:05 -07:00
  • 1cb8de1250 Make some actual verifications inside the autodiff unit test Christoph Hertzberg 2014-04-01 17:44:48 +02:00
  • 56c4851323 Fixed typo: symmretric -> symmetric Florian George 2014-04-01 15:52:25 +02:00
  • ceae5b4145 Fix lapack build Gael Guennebaud 2014-04-01 11:52:23 +02:00
  • ec65e6648c bug #775: propagate generator when workingaround cmake bug #9220 Gael Guennebaud 2014-04-01 11:45:43 +02:00
  • d992634fbc Fix bug #776: it seems that mingw does not support weak linking Gael Guennebaud 2014-04-01 11:31:21 +02:00
  • 5e8622477b Rename the vector() factories defined in blas/common.h into make_vector() to prevent a possible name conflict with std::vector. Benoit Steiner 2014-04-01 11:23:28 +02:00
  • 1221dd90aa Fix no newline at end of file warning Gael Guennebaud 2014-04-01 11:21:14 +02:00
  • 93870d95b7 BTL: add blaze Gael Guennebaud 2014-03-31 10:59:55 +02:00
  • f603823ef3 BTL: fix warnings and extend to 5k matrices, update GotoBlas to OpenBlas, etc. Gael Guennebaud 2014-03-31 10:58:30 +02:00
  • 8d0441052e Finally, prefetching seems to help getting more stable performance Gael Guennebaud 2014-03-31 10:42:19 +02:00
  • 82c8163067 Enable repetition in mixing type unit test Gael Guennebaud 2014-03-31 10:41:40 +02:00
  • 1c0728043a Workaround alignment warnings Gael Guennebaud 2014-03-30 22:43:47 +02:00
  • e497a27ddc Optimize gebp kernel: 1 - increase peeling level along the depth dimention (+5% for large matrices, i.e., >1000) 2 - improve pipelining when dealing with latest rows of the lhs Gael Guennebaud 2014-03-30 21:57:05 +02:00
  • ad59ade116 Vectorized the loop peeling of the inner loop of the block-panel matrix multiplication code. This speeds up the multiplication of matrices which size is not a multiple of the packet size. Benoit Steiner 2014-03-28 12:11:23 -07:00
  • 39bfbd43f0 Properly align the input data to prevent false failures of the packetmath.cpp test. Benoit Steiner 2014-03-28 12:00:08 -07:00
  • 10aa14592a Add a mechanism to recursively access to half-size packet types Gael Guennebaud 2014-03-28 10:18:04 +01:00
  • 8d2bb2c20d merge with default branch Gael Guennebaud 2014-03-28 09:24:18 +01:00
  • c94fde118a Enable vectorization of gemv for PacketSize>4 through unaligned loads (still better than no vectorization) Gael Guennebaud 2014-03-28 09:11:06 +01:00
  • 51e85c936d Merged latest changes from parent. Benoit Steiner 2014-03-27 18:32:15 -07:00
  • 8a94cb3edd Implemented the SSE version of the gather and scatter packet primitives. Benoit Steiner 2014-03-27 18:29:01 -07:00
  • 7f3162f707 Implemented the AVX version of the gather and scatter packet primitives. Benoit Steiner 2014-03-27 17:42:25 -07:00
  • ee86679096 Introduced pscatter/pgather packet primitives. They will be used to optimize the loop peeling code of the block-panel matrix multiplication kernel. Benoit Steiner 2014-03-27 16:03:03 -07:00
  • 58fe2fc2b2 enforce the use of vfmadd231ps for pmadd (gcc and clang stupidely generates the other fmadd variants plus some register moves...) Gael Guennebaud 2014-03-27 23:38:50 +01:00
  • 729363114f Fixed compilation error when FMA instructions are enabled. Benoit Steiner 2014-03-27 11:20:41 -07:00
  • 1697d7a179 Silenced "unused variable" warnings when compiling with FMA. Benoit Steiner 2014-03-27 11:00:47 -07:00
  • 3e1fe8e416 Vectorized the packing of a col-major matrix used as the right hand side argument in a matrix-matrix product when AVX instructions are used. No vectorization takes place when SSE instructions are used, however this doesn't seem to impact performance. Benoit Steiner 2014-03-27 10:38:41 -07:00
  • b776458ccb Vectorized the packing of a row-major matrix used as the left hand side argument in a matrix-matrix product. Benoit Steiner 2014-03-27 10:02:24 -07:00
  • c4902a3d01 Implemented the AVX version of the ptranspose packet primitive. Benoit Steiner 2014-03-27 09:34:51 -07:00
  • 7d73c7f18b Change abi version when enabling AVX with GCC Gael Guennebaud 2014-03-27 15:38:40 +01:00
  • 6f123d209e Fix geo_* unit tests with respect to AVX Gael Guennebaud 2014-03-27 15:29:56 +01:00
  • 052aedd394 Implement pcplflip, palign, predux and the likes from AVC/complexes Gael Guennebaud 2014-03-27 14:47:00 +01:00
  • fb03b56647 Fix warning Gael Guennebaud 2014-03-27 11:38:35 +01:00
  • 6a81594771 Merged in infinitei/eigen (pull request PR-50) Jitse Niesen 2014-03-27 10:12:25 +00:00
  • 9ce0d78513 immintrin.h did not come until intel version 11 Mark Borgerding 2014-03-26 22:26:07 -04:00
  • a419cea4a0 Created the ptranspose packet primitive that can transpose an array of N packets, where N is the number of words in each packet. This primitive will be used to complete the vectorization of the gemm_pack_lhs and gemm_pack_rhs functions. Implemented the primitive using SSE instructions. Benoit Steiner 2014-03-26 19:03:07 -07:00
  • ba3457cab2 Fixed compilation error due to obsolete internal::abs and internal::sqrt function calls Abhijit Kundu 2014-03-26 22:02:48 -04:00
  • 14bc4b9704 Made sure that the version of gemm_pack_rhs specialized for row major matrices is vectorized when nr == 2*PacketSize (which is the case for SSE when compiling in 64bit mode). Benoit Steiner 2014-03-26 17:35:18 -07:00
  • e45a6bed45 Specialized the pload1 packet primitive for Packet8f and Packet4d in order to take advantage of the vbroadcastss and vbroadcastsd instructions whenever possible. Benoit Steiner 2014-03-26 15:58:13 -07:00
  • cc73164aa8 Merged latest updates from the parent branch Benoit Steiner 2014-03-26 15:23:59 -07:00
  • f0a4c9d5ab Update gebp kernel to process a panle of 4 columns at once for the remaining ones. Gael Guennebaud 2014-03-26 23:22:36 +01:00
  • 8be011e776 Remove remaining bits of the dead working buffer Gael Guennebaud 2014-03-26 23:14:44 +01:00
  • a078f442a3 Vectorized the multiplication and division of complex numbers using AVX instructions. Benoit Steiner 2014-03-26 15:11:18 -07:00
  • cf1a7bfbe1 Used AVX instructions to vectorize the complex version of the pfirst and ploaddup packet primitives. Silenced a few compilation warnings. Benoit Steiner 2014-03-26 12:03:31 -07:00
  • bc401eb6fa Implement new 1 packet x 8 gebp kernel Gael Guennebaud 2014-03-26 18:53:00 +01:00
  • b286a1e75c add pbroadcast2/4 generic intrinsics Gael Guennebaud 2014-03-26 16:46:36 +01:00
  • 6bf3cc2732 Use AVX instructions to vectorize pset1<Packet2cd>, pset1<Packet4cf>, preverse<Packet2cd>, and preverse<Packet4cf> Benoit Steiner 2014-03-25 09:00:43 -07:00
  • 7ae9b0805d Used AVX instructions to vectorize the predux_min<Packet8f>, predux_min<Packet4d>, predux_max<Packet8f>, and predux_max<Packet4d> packet primitives. Benoit Steiner 2014-03-24 13:33:40 -07:00
  • 08f7b3221d Added proper support for AVX and FMA in the makefiles. Benoit Steiner 2014-03-24 09:52:45 -07:00
  • 72707a8664 Made sure that EIGEN_ALIGN is defined when EIGEN_DONT_VECTORIZE is set to true to prevent build failures when vectorization is disabled. Benoit Steiner 2014-03-21 11:40:29 -07:00
  • 8a0845ebd7 Merged latest changes from the parent Benoit Steiner 2014-03-18 12:58:08 -07:00
  • 0a6c472335 A bit of cleaning Gael Guennebaud 2014-03-13 15:44:20 +01:00
  • aceae8314b Resurect EvalBeforeNestingBit to control nested_eval Gael Guennebaud 2014-03-12 20:25:36 +01:00
  • 16d4c7a5e8 Conditionally disable unit tests that are not supported by evaluators yet Gael Guennebaud 2014-03-12 20:23:44 +01:00
  • a395024d44 More debug info and use lazyProd instead of operator* to query the right flags Gael Guennebaud 2014-03-12 18:14:58 +01:00
  • f74ed34539 Fix regressions in redux_evaluator flags and evaluator<Block> flags Gael Guennebaud 2014-03-12 18:14:08 +01:00
  • 5e26b7cf9d Extend evaluation traits debuging info Gael Guennebaud 2014-03-12 18:13:18 +01:00
  • 74b1d79d77 merge default and evaluator branches Gael Guennebaud 2014-03-12 16:24:25 +01:00
  • 0b362e0c9a This file is not needed anymore Gael Guennebaud 2014-03-12 16:18:54 +01:00
  • a6be1952f4 Fix a few regression when moving the flags Gael Guennebaud 2014-03-12 16:18:34 +01:00
  • 0bd5671b9e Fix Eigenvalues module Gael Guennebaud 2014-03-12 13:35:44 +01:00
  • 8dd3b716e3 Move evaluation related flags from traits to evaluator and fix evaluators of MapBase and Replicate Gael Guennebaud 2014-03-12 13:34:11 +01:00
  • 7eefdb948c Migrate JacobiSVD to Solver Gael Guennebaud 2014-03-11 13:43:46 +01:00
  • 082f7ddc37 Port Cholesky module to evaluators Gael Guennebaud 2014-03-11 13:33:44 +01:00
  • 9be72cda2a Port QR module to Solve/Inverse Gael Guennebaud 2014-03-11 11:47:32 +01:00
  • ae40583965 Fix CoeffReadCost issues Gael Guennebaud 2014-03-11 11:47:14 +01:00
  • 5806e73800 It is not clear what XprType::Nested should be, so let's use nested<Xpr>::type as much as possible Gael Guennebaud 2014-03-11 11:44:11 +01:00
  • 2bf63c6b4a Even ReturnByValue should not evaluate when assembling the expression Gael Guennebaud 2014-03-11 11:42:07 +01:00
  • da6ec81282 Move CoeffReadCost mechanism to evaluators Gael Guennebaud 2014-03-10 23:24:40 +01:00