Benoit Steiner
|
283e33dea4
|
ptranspose is not a template.
|
2016-05-23 19:55:55 -07:00 |
|
Benoit Steiner
|
7d980d74e5
|
Started to vectorize the processing of 16bit floats on CPU.
|
2016-05-23 15:21:40 -07:00 |
|
Benoit Steiner
|
fae0493f98
|
Fixed a couple of bugs related to the Pascalfamily of GPUs
H: Enter commit message. Lines beginning with 'HG:' are removed.
|
2016-05-11 23:02:26 -07:00 |
|
Benoit Steiner
|
b6a517c47d
|
Added the ability to load fp16 using the texture path.
Improved the performance of some reductions on fp16
|
2016-05-11 21:26:48 -07:00 |
|
Benoit Steiner
|
56a1757d74
|
Made predux_min and predux_max on fp16 less noisy
|
2016-05-11 17:37:34 -07:00 |
|
Benoit Steiner
|
9091351dbe
|
__ldg is only available with cuda architectures >= 3.5
|
2016-05-11 15:22:13 -07:00 |
|
Benoit Steiner
|
02f76dae2d
|
Fixed a typo
|
2016-05-11 15:08:38 -07:00 |
|
Benoit Steiner
|
0b9e3dcd06
|
Added packet primitives to compute exp, log, sqrt and rsqrt on fp16. This improves the performance by 10 to 30%.
|
2016-05-10 11:05:33 -07:00 |
|
Benoit Steiner
|
8adf5cc70f
|
Added support for packet processing of fp16 on kepler and maxwell gpus
|
2016-05-06 19:16:43 -07:00 |
|
Benoit Steiner
|
995f202cea
|
Disabled the use of half2 on cuda devices of compute capability < 5.3
|
2016-04-08 14:43:36 -07:00 |
|
Benoit Steiner
|
3394379319
|
Fixed the packet_traits for half floats.
|
2016-04-08 13:33:59 -07:00 |
|
Benoit Steiner
|
14ea7c7ec7
|
Fixed packet_traits<half>
|
2016-04-06 19:30:21 -07:00 |
|
Benoit Steiner
|
048c4d6efd
|
Made half floats usable on hardware that doesn't support them natively.
|
2016-03-11 17:21:42 -08:00 |
|
Benoit Steiner
|
456e038a4e
|
Fixed the +=, -=, *= and /= operators to return a reference
|
2016-03-10 15:17:44 -08:00 |
|
Benoit Steiner
|
1032441c6f
|
Enable partial support for half floats on Kepler GPUs.
|
2016-03-03 10:34:20 -08:00 |
|
Benoit Steiner
|
6270d851e3
|
Declare the half float type as arithmetic.
|
2016-02-22 13:59:33 -08:00 |
|
Benoit Steiner
|
584832cb3c
|
Implemented the ptranspose function on half floats
|
2016-02-21 12:44:53 -08:00 |
|
Benoit Steiner
|
95fceb6452
|
Added the ability to compute the absolute value of a half float
|
2016-02-21 20:24:11 +00:00 |
|
Benoit Steiner
|
9ff269a1d3
|
Moved some of the fp16 operators outside the Eigen namespace to workaround some nvcc limitations.
|
2016-02-20 07:47:23 +00:00 |
|
Benoit Steiner
|
5c4901b83a
|
Implemented the scalar division of 2 half floats
|
2016-02-19 10:03:19 -08:00 |
|
Benoit Steiner
|
f7cb755299
|
Added support for operators +=, -=, *= and /= on CUDA half floats
|
2016-02-19 15:57:26 +00:00 |
|
Benoit Steiner
|
ac5d706a94
|
Added support for simple coefficient wise tensor expression using half floats on CUDA devices
|
2016-02-19 08:19:12 +00:00 |
|