Commit Graph

  • 318e65e0ae Fix missing inclusion of Eigen/Core Gael Guennebaud 2016-04-27 23:05:40 +02:00
  • f629fe95c8 Made the index type a template parameter to evaluateProductBlockingSizes Use numext::mini and numext::maxi instead of std::min/std::max to compute blocking sizes. Benoit Steiner 2016-04-27 13:11:19 -07:00
  • 66b215b742 Merged latest updates from trunk Benoit Steiner 2016-04-27 12:57:48 -07:00
  • 25141b69d4 Improved support for min and max on 16 bit floats when running on recent cuda gpus Benoit Steiner 2016-04-27 12:57:21 -07:00
  • ff33798acd Merged eigen/eigen into default Rasmus Larsen 2016-04-27 12:27:00 -07:00
  • 463738ccbe Use computeProductBlockingSizes to compute blocking for both ShardByCol and ShardByRow cases. Rasmus Munk Larsen 2016-04-27 12:26:18 -07:00
  • 6744d776ba Added support for fpclassify in Eigen::Numext Benoit Steiner 2016-04-27 12:10:25 -07:00
  • 1f48f47ab7 Implement stricter argument checking for SYRK and SY2K and real matrices. To implement the BLAS API they should return info=2 if op='C' is passed for a complex matrix. Without this change, the Eigen BLAS fails the strict zblat3 and cblat3 tests in LAPACK 3.5. Rasmus Munk Larsen 2016-04-27 19:59:44 +02:00
  • 3dddd34133 Refactor the unsupported CXX11/Core module to internal headers only. Gael Guennebaud 2016-04-26 11:20:25 +02:00
  • 4a164d2c46 Fixed the partial evaluation of non vectorizable tensor subexpressions Benoit Steiner 2016-04-25 10:43:03 -07:00
  • fd9401f260 Refined the cost of the striding operation. Benoit Steiner 2016-04-25 09:16:08 -07:00
  • e19b58e672 alias template for matrix and array classes Heiko Bauke 2016-04-23 00:08:51 +02:00
  • 3f80696ae1 Merged eigen/eigen into default Konstantinos Margaritis 2016-04-22 15:05:21 +03:00
  • 5c372d19e3 Merged in rmlarsen/eigen (pull request PR-179) Benoit Steiner 2016-04-21 18:06:36 -07:00
  • 4bbc97be5e Provide access to the base threadpool classes Benoit Steiner 2016-04-21 17:59:33 -07:00
  • a3256d78d8 Prevent crash in CompleteOrthogonalDecomposition if object was default constructed. Rasmus Munk Larsen 2016-04-21 16:49:28 -07:00
  • 33adce5c3a Added the ability to switch to the new thread pool with a #define Benoit Steiner 2016-04-21 11:59:58 -07:00
  • 79b900375f Use index list for the striding benchmarks Benoit Steiner 2016-04-21 11:58:27 -07:00
  • f670613e4b Fixed several compilation warnings Benoit Steiner 2016-04-21 11:03:02 -07:00
  • 6015422ee6 Added an option to enable the use of the F16C instruction set Benoit Steiner 2016-04-21 10:30:29 -07:00
  • 32ffce04fc Use EIGEN_THREAD_YIELD instead of std::this_thread::yield to make the code more portable. Benoit Steiner 2016-04-21 08:47:28 -07:00
  • e5b2ef47d5 Merged eigen/eigen into default Konstantinos Margaritis 2016-04-21 18:03:08 +03:00
  • 2dde1b1028 Don't crash when attempting to reduce empty tensors. Benoit Steiner 2016-04-20 18:08:20 -07:00
  • a792cd357d Added more tests Benoit Steiner 2016-04-20 17:33:58 -07:00
  • 80200a1828 Don't attempt to leverage the _cvtss_sh and _cvtsh_ss instructions when compiling with clang since it's unclear which versions of clang actually support these instruction. Benoit Steiner 2016-04-20 12:10:27 -07:00
  • c7c2054bb5 Started to implement a portable way to yield. Benoit Steiner 2016-04-19 17:59:58 -07:00
  • 1d0238375d Made sure all the required header files are included when trying to use fp16 Benoit Steiner 2016-04-19 17:44:12 -07:00
  • 2b72163028 Implemented a more portable version of thread local variables Benoit Steiner 2016-04-19 15:56:02 -07:00
  • 04f954956d Fixed a few typos Benoit Steiner 2016-04-19 15:27:09 -07:00
  • 5b1106c56b Fixed a compilation error with nvcc 7. Benoit Steiner 2016-04-19 14:57:57 -07:00
  • 7129d998db Simplified the code that launches cuda kernels. Benoit Steiner 2016-04-19 14:55:21 -07:00
  • b9ea40c30d Don't take the address of a kernel on CUDA devices that don't support this feature. Benoit Steiner 2016-04-19 14:35:11 -07:00
  • 884c075058 Use numext::ceil instead of std::ceil Benoit Steiner 2016-04-19 14:33:30 -07:00
  • a278414d1b Avoid an unnecessary copy of the evaluator. Benoit Steiner 2016-04-19 13:54:28 -07:00
  • f953c60705 Fixed 2 recent regression tests Benoit Steiner 2016-04-19 12:57:39 -07:00
  • 50968a0a3e Use DenseIndex in the MeanReducer to avoid overflows when processing very large tensors. Benoit Steiner 2016-04-19 11:53:58 -07:00
  • 84543c8be2 Worked around the lack of a rand_r function on windows systems Benoit Steiner 2016-04-17 19:29:27 -07:00
  • 5fbcfe5eb4 Worked around the lack of a rand_r function on windows systems Benoit Steiner 2016-04-17 18:42:31 -07:00
  • e4fe611e2c Enable lazy-coeff-based-product for vector*(1x1) products Gael Guennebaud 2016-04-16 15:17:39 +02:00
  • c8e8f93d6c Move the evalGemm method into the TensorContractionEvaluatorBase class to make it accessible from both the single and multithreaded contraction evaluators. Benoit Steiner 2016-04-15 16:48:10 -07:00
  • 1a16fb1532 Deleted extraneous comma. Benoit Steiner 2016-04-15 15:50:13 -07:00
  • 7cff898e0a Deleted unnecessary variable Benoit Steiner 2016-04-15 15:46:14 -07:00
  • 6c43c49e4a Fixed a few compilation warnings Benoit Steiner 2016-04-15 15:34:34 -07:00
  • eb669f989f Merged in rmlarsen/eigen (pull request PR-178) Benoit Steiner 2016-04-15 14:53:15 -07:00
  • 2a7115daca bug #1203: by-pass large stack-allocation in stableNorm if EIGEN_STACK_ALLOCATION_LIMIT is too small Gael Guennebaud 2016-04-15 22:34:11 +02:00
  • 3718bf654b Get rid of void* casting when calling EvalRange::run. Rasmus Munk Larsen 2016-04-15 12:51:33 -07:00
  • 40c9923a8a Fixed compilation errors with msvc Benoit Steiner 2016-04-15 11:27:52 -07:00
  • 1d23430628 Improved the matrix multiplication blocking in the case where mr is not a power of 2 (e.g on Haswell CPUs). Benoit Steiner 2016-04-15 10:53:31 -07:00
  • 1e80bddde3 Fix trmv for mixing types. Gael Guennebaud 2016-04-15 17:58:36 +02:00
  • 0e8fc31087 remove pgather/pscatter for std::complex<double> for s390x Konstantinos Margaritis 2016-04-15 07:08:57 -04:00
  • a62e924656 Added ability to access the cache sizes from the tensor devices Benoit Steiner 2016-04-14 21:25:06 -07:00
  • 18e6f67426 Added support for exclusive or Benoit Steiner 2016-04-14 20:37:46 -07:00
  • 07ac4f7e02 Eigen Tensor cost model part 2: Thread scheduling for standard evaluators and reductions. The cost model is turned off by default. Rasmus Munk Larsen 2016-04-14 18:28:23 -07:00
  • 9624a1ea3d Added missing definition of PacketSize in the gpu evaluator of convolution Benoit Steiner 2016-04-14 17:16:58 -07:00
  • 6fbedf5a4e Merged in rmlarsen/eigen (pull request PR-177) Benoit Steiner 2016-04-14 17:13:19 -07:00
  • bebb89acfa Enabled the new threadpool tests Benoit Steiner 2016-04-14 16:44:10 -07:00
  • 9c064b5a97 Cleanup Benoit Steiner 2016-04-14 16:41:31 -07:00
  • 1372156c41 Prepared the migration to the new non blocking thread pool Benoit Steiner 2016-04-14 16:16:42 -07:00
  • aeb5494a0b Improvements to cost model. Rasmus Munk Larsen 2016-04-14 15:52:58 -07:00
  • 00dfe18487 Merged latest updates from trunk Benoit Steiner 2016-04-14 15:25:20 -07:00
  • a8e8837ba7 Added tests for the non blocking thread pool Benoit Steiner 2016-04-14 15:23:49 -07:00
  • 78a51abc12 Added a more scalable non blocking thread pool Benoit Steiner 2016-04-14 15:23:10 -07:00
  • d2e95492e7 Merge upstream updates. Rasmus Munk Larsen 2016-04-14 13:59:50 -07:00
  • 235e83aba6 Eigen cost model part 1. This implements a basic recursive framework to estimate the cost of evaluating tensor expressions. Rasmus Munk Larsen 2016-04-14 13:57:35 -07:00
  • 68897c52f3 Add extreme values to the imaginary part for SVD unit tests. Gael Guennebaud 2016-04-14 22:47:30 +02:00
  • 20f387fafa Improve numerical robustness of JacoviSVD: - avoid noise amplification in complex to real conversion - compare off-diagonal entries to the current biggest diagonal entry: no need to bother about a 2x2 block containing ridiculously small entries compared to the rest of the matrix. Gael Guennebaud 2016-04-14 22:46:55 +02:00
  • 7718749fee Force the inlining of the << operator on half floats Benoit Steiner 2016-04-14 11:51:54 -07:00
  • 5379d2b594 Inline the << operator on half floats Benoit Steiner 2016-04-14 11:40:48 -07:00
  • 5912ad877c Silenced a compilation warning Benoit Steiner 2016-04-14 11:40:14 -07:00
  • 2b6e3de02f Added tests to validate flooring and ceiling of fp16 Benoit Steiner 2016-04-14 11:39:18 -07:00
  • 6f23e945f6 Added simple test for numext::sqrt and numext::pow on fp16 Benoit Steiner 2016-04-14 10:32:52 -07:00
  • 72510c80e1 Added basic test for trigonometric functions on fp16 Benoit Steiner 2016-04-14 10:27:24 -07:00
  • 7b3d7acebe Added support for fp16 to test_isApprox, test_isMuchSmallerThan, and test_isApproxOrLessThan Benoit Steiner 2016-04-14 10:25:50 -07:00
  • 5c13765ee3 Added ability to printf fp16 Benoit Steiner 2016-04-14 10:24:52 -07:00
  • c7167fee0e Added support for fp16 to the sigmoid function Benoit Steiner 2016-04-14 10:08:33 -07:00
  • f6003f0873 Made the test msvc friendly Benoit Steiner 2016-04-14 09:47:26 -07:00
  • 3551dea887 Cleaning pass on rcond estimator. Gael Guennebaud 2016-04-14 16:45:41 +02:00
  • d8a3bdaa24 remove useless include Gael Guennebaud 2016-04-14 15:18:56 +02:00
  • d402adc3d7 Better use .data() than &coeffRef(0) Gael Guennebaud 2016-04-14 15:18:08 +02:00
  • ea7087ef31 Merged in rmlarsen/eigen (pull request PR-174) Gael Guennebaud 2016-04-14 15:11:33 +02:00
  • 36f5a10198 Properly gate the definition of the error and gamma functions for fp16 Benoit Steiner 2016-04-13 18:44:48 -07:00
  • 10b69810d1 Improved support for trigonometric functions on GPU Benoit Steiner 2016-04-13 16:00:51 -07:00
  • d6105b53b8 Added basic implementation of the lgamma, digamma, igamma, igammac, polygamma, and zeta function for fp16 Benoit Steiner 2016-04-13 15:26:02 -07:00
  • 703251f10f merge Gael Guennebaud 2016-04-13 23:45:10 +02:00
  • 39211ba46b Fix JacobiSVD for complex when the complex-to-real update already gives a diagonal 2x2 block. Gael Guennebaud 2016-04-13 23:43:26 +02:00
  • 2986253259 Cleaned up the implementation of digamma Benoit Steiner 2016-04-13 14:24:06 -07:00
  • d5de1a8220 Pulled latest updates from trunk Benoit Steiner 2016-04-13 14:17:11 -07:00
  • 87ca15c4e8 Added support for sin, cos, tan, and tanh on fp16 Benoit Steiner 2016-04-13 14:12:38 -07:00
  • 2c9e4fa417 Add debug output for random unit test Gael Guennebaud 2016-04-13 22:56:12 +02:00
  • 7d1391d049 Turn a converge check to a warning Gael Guennebaud 2016-04-13 22:50:54 +02:00
  • feef39e2d1 Fix underflow in JacoviSVD's complex to real preconditioner Gael Guennebaud 2016-04-13 22:49:51 +02:00
  • f4e12272f1 Fix corner case in unit test. Gael Guennebaud 2016-04-13 22:18:02 +02:00
  • a95e1a273e Fix warning in unit tests Gael Guennebaud 2016-04-13 22:00:38 +02:00
  • bf3f6688f0 Added support for computing cos, sin, tan, and tanh on GPU. Benoit Steiner 2016-04-13 11:55:08 -07:00
  • 473c8380ea Added constructors to convert unsigned integers into fp16 Benoit Steiner 2016-04-13 11:03:37 -07:00
  • 42a3352a3b Workaround a division by zero when outerstride==0 Gael Guennebaud 2016-04-13 19:02:02 +02:00
  • 6f960b83ff Make use of is_same_dense helper instead of extract_data to detect input/outputs are the same. Gael Guennebaud 2016-04-13 18:47:12 +02:00
  • b7716c0328 Fix incomplete previous patch on matrix comparision. Gael Guennebaud 2016-04-13 18:32:56 +02:00
  • 2630d97c62 Fix detection of same matrices when both matrices are not handled by extract_data. Gael Guennebaud 2016-04-13 18:26:08 +02:00
  • 512ba0ac76 Add regression unit tests for half-packet vectorization Gael Guennebaud 2016-04-13 18:16:35 +02:00