eigen

mirror of https://gitlab.com/libeigen/eigen.git synced 2026-04-10 11:34:33 +08:00

Author	SHA1	Message	Date
Gael Guennebaud	79b4e6acaf	Fix bug #987 : wrong alignement guess in diagonal product.	2015-03-31 23:35:12 +02:00
Gael Guennebaud	8313fb7df7	Add row/column-wise reverseInPlace feature.	2015-03-31 21:35:53 +02:00
Gael Guennebaud	dfb674a25e	Make reverseInPlace really work in-place.	2015-03-31 20:17:10 +02:00
Gael Guennebaud	20d030f207	Fix vectorization of swap for non trivial expressions	2015-03-31 20:16:02 +02:00
Benoit Jacob	73cdeae1d3	Only use blocking sizes LUTs for single-thread products for now	2015-03-31 11:17:23 -04:00
Benoit Jacob	0cbd5ae3cb	Correctly detect Android with ndk_build	2015-03-31 11:17:21 -04:00
Gael Guennebaud	ae01c05e18	Fix computeProductBlockingSizes with m==0, and add respective unit test.	2015-03-31 15:19:57 +02:00
Gael Guennebaud	79cb875249	merge	2015-03-27 10:56:04 +01:00
Gael Guennebaud	3d59ae0203	Fix hypot(0,0).	2015-03-27 09:59:24 +01:00
Benoit Steiner	abdbe8562e	Fixed the CUDA packet primitives	2015-03-24 10:45:46 -07:00
Gael Guennebaud	29eaa2b0f1	Make MatrixBase::is* methods aware of nested_eval.	2015-03-24 13:42:42 +01:00
Gael Guennebaud	d6b2f300db	Fix MSVC compilation: aligned type must be passed by reference	2015-03-19 17:28:32 +01:00
Gael Guennebaud	61c45d7cfd	Fix comparison warning	2015-03-19 17:13:22 +01:00
Gael Guennebaud	f329d0908a	Improve random number generation for integer and add unit test	2015-03-19 15:10:36 +01:00
Benoit Jacob	dc04f12967	use unsigned short instead of uint16_t which doesn't exist in c++98	2015-03-17 10:31:45 -04:00
Benoit Jacob	364cfd529d	Similar to cset `3589a9c115` , also in 2px4 kernel: actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment	2015-03-16 16:28:44 -04:00
Benoit Jacob	eb6929cb19	fix bug in maxsize calculation, which would cause products of size > 2048 to address the lookup table out of bounds	2015-03-16 16:15:47 -04:00
Benoit Jacob	35c3a8bb84	Update Nexus 5 lookup table from combining now 2 runs of the benchmark, using the analyze-blocking-sizes partition tool. Gives better worst-case performance.	2015-03-16 11:05:51 -04:00
Benoit Jacob	e274607d7f	fix compilation with GCC 4.8	2015-03-16 10:48:27 -04:00
Benoit Jacob	151b8b95c6	Fix bug in case where EIGEN_TEST_SPECIFIC_BLOCKING_SIZE is defined but false	2015-03-15 19:10:51 -04:00
Benoit Jacob	02babb9c0f	Provide a empirical lookup table for blocking sizes measured on a Nexus 5. Only for float, only for Android on ARM 32bit for now.	2015-03-15 18:13:12 -04:00
Benoit Jacob	3589a9c115	actual_panel_rows computation should always be resilient to parameters not consistent with the known L1 cache size, see comment	2015-03-15 18:12:18 -04:00
Benoit Jacob	1dd3d89818	Fix a unused-var warning	2015-03-15 18:07:19 -04:00
Benoit Jacob	e56aabf205	Refactor computeProductBlockingSizes to make room for the possibility of using lookup tables	2015-03-15 18:05:12 -04:00
Benoit Jacob	488c15615a	organize a little our default cache sizes, and use a saner default L1 outside of x86 (10% faster on Nexus 5)	2015-03-13 14:51:26 -07:00
Gael Guennebaud	1330f8bbd1	bug #973 , improve AVX support by enabling vectorization of Vector4i-like types, and enforcing alignement of Vector4f/Vector2d-like types to preserve compatibility with SSE and future Eigen versions that will vectorize them with AVX enabled.	2015-03-13 21:15:50 +01:00
Gael Guennebaud	d99ab35f9e	Fix internal::random(x,y) for integer types. The previous implementation could return y+1. The new implementation uses rejection sampling to get an unbiased behabior.	2015-03-13 21:12:46 +01:00
Gael Guennebaud	0ee391863e	Avoid undeflow when blocking size are tuned manually.	2015-03-06 21:51:09 +01:00
Gael Guennebaud	14a5f135a3	bug #969 : workaround abiguous calls to Ref using enable_if.	2015-03-06 17:51:31 +01:00
Gael Guennebaud	87681e508f	bug #978 : early return for vanishing products	2015-03-06 16:11:22 +01:00
Gael Guennebaud	cd3bbffa73	Improve blocking heuristic: if the lhs fit within L1, then block on the rhs in L1 (allows to keep packed rhs in L1)	2015-03-06 14:31:39 +01:00
Gael Guennebaud	58740ce4c6	Improve product kernel: replace the previous dynamic loop swaping strategy by a more general one: It consists in increasing the actual number of rows of lhs's micro horizontal panel for small depth such that L1 cache is fully exploited.	2015-03-06 10:30:35 +01:00
Gael Guennebaud	7550107028	Product optimization: implement a dynamic loop-swapping startegy to improve memory accesses to the destination matrix in the case of K-rank-update like products, i.e., for products of the kind: "large x small" * "small x large"	2015-03-05 10:03:46 +01:00
Benoit Jacob	2aa09e6b4e	Fix asm comments in 1px1 kernel	2015-03-03 13:44:00 -05:00
Benoit Jacob	eae8e27b7d	Add a benchmark-default-sizes action to benchmark-blocking-sizes.cpp	2015-03-03 11:41:21 -05:00
Marc Glisse	37a93c4263	New scoring functor to select the pivot. This is can be useful for non-floating point scalars, where choosing the biggest element is generally not the best choice.	2015-03-03 17:08:28 +01:00
Benoit Jacob	ccc1277a42	must also disable complex<double> when disabling double vectorization	2015-03-03 10:17:05 -05:00
Benoit Jacob	f839099512	Work around an ICE in Clang 3.5 in the iOS toolchain with double NEON intrinsics.	2015-03-03 09:35:22 -05:00
Benoit Jacob	1ec0f4fadf	HalfPacket also needed to be disabled for double, on ARMv8.	2015-03-02 16:08:54 -05:00
Gael Guennebaud	9aee1e300a	Increase unit-test L1 cache size to ensure we are doing at least 2 peeled loop within product kernel.	2015-02-27 22:55:12 +01:00
Gael Guennebaud	b10cd3afd2	Re-enbale detection of min/max parentheses protection, and re-enable mpreal_support unit test.	2015-02-27 22:38:00 +01:00
Benoit Jacob	6466fa63be	Reimplement the selection between rotating and non-rotating kernels using templates instead of macros and if()'s. That was needed to fix the build of unit tests on ARM, which I had broken. My bad for not testing earlier.	2015-02-27 15:30:10 -05:00
Benoit Jacob	2fc3b484d7	remove trailing comma	2015-02-27 11:37:45 -05:00
Benoit Jacob	33669348c4	Disable Packet2f/2i halfpacket support in NEON. I believe that it was erroneously turned on, since Packet2f/2i intrinsics are unimplemented, and code trying to use halfpackets just fails to compile on NEON, as it tries to use the default implementation of pload/pstore and the types don't match.	2015-02-27 11:35:37 -05:00
Benoit Jacob	b7fc8746e0	Replace a static assert by a runtime one, fixes the build of unit tests on ARM Also safely assert in the non-implemented path that should never be taken in practice, and would return wrong results.	2015-02-27 10:01:59 -05:00
Gael Guennebaud	bcf9bb5c1f	Avoid packing rhs multiple-times when blocking on the lhs only.	2015-02-26 17:01:33 +01:00
Gael Guennebaud	4ec3f04b3a	Make sure that the block size computation is tested by our unit test.	2015-02-26 17:00:36 +01:00
Gael Guennebaud	a8ad8887bf	Implement a more generic blocking-size selection algorithm. See explanations inlines. It performs extremely well on Haswell. The main issue is to reliably and quickly find the actual cache size to be used for our 2nd level of blocking, that is: max(l2,l3/nb_core_sharing_l3)	2015-02-26 16:04:35 +01:00
Gael Guennebaud	400becc591	Fix typos in block-size testing code, and set peeling on k to 8.	2015-02-26 15:57:06 +01:00
Benoit Jacob	692136350b	So I extensively measured the impact of the offset in this prefetch. I tried offset values from 0 to 128 (on this float* pointer, so implicitly times 4 bytes). On x86, I tested a Sandy Bridge with AVX with 12M cache and a Haswell with AVX+FMA with 6M cache on MatrixXf sizes up to 2400. I could not see any significant impact of this offset. On Nexus 5, the offset has a slight effect: values around 32 (times sizeof float) are worst. Anything else is the same: the current 64 (8*pk), or... 0. So let's just go with 0! Note that we needed a fix anyway for not accounting for the value of RhsProgress. 0 nicely avoids the issue altogether!	2015-02-25 12:37:14 -05:00

1 2 3 4 5 ...

2431 Commits