<!--
Thanks for contributing a merge request!
We recommend that first-time contributors read our [contribution guidelines](https://eigen.tuxfamily.org/index.php?title=Contributing_to_Eigen).
Before submitting the MR, please complete the following checks:
- Create one PR per feature or bugfix,
- Run the test suite to verify your changes.
See our [test guidelines](https://eigen.tuxfamily.org/index.php?title=Tests).
- Add tests to cover the bug addressed or any new feature.
- Document new features. If it is a substantial change, add it to the [Changelog](https://gitlab.com/libeigen/eigen/-/blob/master/CHANGELOG.md).
- Leave the following box checked when submitting: `Allow commits from members who can merge to the target branch`.
This allows us to rebase and merge your change.
Note that we are a team of volunteers; we appreciate your patience during the review process.
-->
### Description
<!--Please explain your changes.-->
Remove deprecated CUDA device properties.
### Reference issue
<!--
You can link to a specific issue using the gitlab syntax #<issue number>.
If the MR fixes an issue, write "Fixes #<issue number>" to have the issue automatically closed on merge.
-->
Fixes#3000.
Closes#3000
See merge request libeigen/eigen!2060
Duplicating the namespace `tuple_impl` caused a conflict with the
`arch/GPU/Tuple.h` definitions for the `tuple_test`. We can't
just use `Eigen::internal` either, since there exists a different
`Eigen::internal::get`. Renaming the namespace to `test_detail`
fixes the issue.
This introduces new functions:
```
// returns kernel(args...) running on the CPU.
Eigen::run_on_cpu(Kernel kernel, Args&&... args);
// returns kernel(args...) running on the GPU.
Eigen::run_on_gpu(Kernel kernel, Args&&... args);
Eigen::run_on_gpu_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
// returns kernel(args...) running on the GPU if using
// a GPU compiler, or CPU otherwise.
Eigen::run(Kernel kernel, Args&&... args);
Eigen::run_with_hint(size_t buffer_capacity_hint, Kernel kernel, Args&&... args);
```
Running on the GPU is accomplished by:
- Serializing the kernel inputs on the CPU
- Transferring the inputs to the GPU
- Passing the kernel and serialized inputs to a GPU kernel
- Deserializing the inputs on the GPU
- Running `kernel(inputs...)` on the GPU
- Serializing all output parameters and the return value
- Transferring the serialized outputs back to the CPU
- Deserializing the outputs and return value on the CPU
- Returning the deserialized return value
All inputs must be serializable (currently POD types, `Eigen::Matrix`
and `Eigen::Array`). The kernel must also be POD (though usually
contains no actual data).
Tested on CUDA 9.1, 10.2, 11.3, with g++-6, g++-8, g++-10 respectively.
This MR depends on !622, !623, !624.