doc/UsingAOCL.dox

/*
 Copyright (c) 2025, AMD Inc. All rights reserved.
 Redistribution and use in source and binary forms, with or without modification,
 are permitted provided that the following conditions are met:
 * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.
 * Neither the name of AMD nor the names of its contributors may
   be used to endorse or promote products derived from this software without
   specific prior written permission.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 ********************************************************************************
 * Content : Documentation on the use of AMD AOCL through Eigen
 ********************************************************************************
*/

namespace Eigen {

/** \page TopicUsingAOCL Using AMD® AOCL from %Eigen

Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later).

<a href="https://www.amd.com/en/developer/aocl.html"> AMD AOCL </a> provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures.

\note
AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use.

Using AMD AOCL through %Eigen is straightforward:
-# export \c AOCL_ROOT into your environment 
-# define one of the AOCL macros before including any %Eigen headers (see table below)
-# link your program to AOCL libraries (BLIS, FLAME, LibM)
-# ensure your system supports the target architecture optimizations

When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines.
These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex<float>, and \c complex<double>.
Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms.

The AOCL integration targets three core components:
- **BLIS**: High-performance BLAS implementation optimized for modern cache hierarchies
- **FLAME**: Dense linear algebra algorithms providing LAPACK functionality  
- **LibM**: Optimized standard math routines with vectorized implementations

\section TopicUsingAOCL_Macros Configuration Macros

You can choose which parts will be substituted by defining one or multiple of the following macros:

<table class="manual">
<tr><td>\c EIGEN_USE_BLAS </td><td>Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)</td></tr>
<tr class="alt"><td>\c EIGEN_USE_LAPACKE </td><td>Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)</td></tr>
<tr><td>\c EIGEN_USE_LAPACKE_STRICT </td><td>Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.</td></tr>
<tr class="alt"><td>\c EIGEN_USE_AOCL_VML </td><td>Enables the use of AOCL LibM vector math operations for coefficient-wise functions</td></tr>
<tr><td>\c EIGEN_USE_AOCL_ALL </td><td>Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML</td></tr>
<tr class="alt"><td>\c EIGEN_USE_AOCL_MT </td><td>Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.</td></tr>
</table>

\note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead.

\section TopicUsingAOCL_Performance Performance Considerations

The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations:

- **Multi-threading**: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library
- **Architecture targeting**: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5)
- **Vector Math Library**: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously
- **Memory layout**: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation

\section TopicUsingAOCL_Types Supported Data Types and Sizes

AOCL acceleration is applied to:
- **Scalar types**: \c float, \c double, \c complex<float>, \c complex<double>
- **Matrix/Vector sizes**: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD  
- **Storage order**: Both column-major (default) and row-major layouts
- **Memory alignment**: Eigen's data pointers are directly compatible with AOCL function signatures

The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float.

\section TopicUsingAOCL_Functions Vector Math Functions

The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:

<table class="manual">
<tr><th>Code example</th><th>AOCL routines</th></tr>
<tr><td>\code
v2 = v1.array().exp();
v2 = v1.array().sin();
v2 = v1.array().cos();
v2 = v1.array().tan();
v2 = v1.array().log();
v2 = v1.array().log10();
v2 = v1.array().log2();
v2 = v1.array().sqrt();
v2 = v1.array().pow(1.5);
v2 = v1.array() + v2.array();
\endcode</td><td>\code
amd_vrda_exp
amd_vrda_sin
amd_vrda_cos
amd_vrda_tan
amd_vrda_log
amd_vrda_log10
amd_vrda_log2
amd_vrda_sqrt
amd_vrda_pow
amd_vrda_add
\endcode</td></tr>
</table>

In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD.

\section TopicUsingAOCL_Example Complete Example

\code
#define EIGEN_USE_AOCL_MT
#include <iostream>
#include <Eigen/Dense>

int main() {
    const int n = 2048;
    
    // Large matrices automatically use AOCL-BLIS for multiplication
    Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n);
    Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n);
    Eigen::MatrixXd C = A * B;  // Dispatched to dgemm
    
    // Large vectors automatically use AOCL LibM for math functions
    Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10);
    Eigen::VectorXd result = v.array().sin();  // Dispatched to amd_vrda_sin
    
    // LAPACK decompositions use AOCL-FLAME
    Eigen::LLT<Eigen::MatrixXd> llt(A);  // Dispatched to dpotrf
    
    std::cout << "Matrix norm: " << C.norm() << std::endl;
    std::cout << "Vector result norm: " << result.norm() << std::endl;
    
    return 0;
}
\endcode

\section TopicUsingAOCL_Building Building and Linking

To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries:

\code
export AOCL_ROOT=/path/to/aocl
clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \
        -I./install/include -I${AOCL_ROOT}/include \
        -Wno-parentheses my_app.cpp \
        -L${AOCL_ROOT} -lamdlibm -lflame -lblis \
        -lpthread -lrt -lm -lomp \
        -o eigen_aocl_example
\endcode

For multi-threaded performance, use the multi-threaded BLIS library:
\code
clang++ -O3 -g -DEIGEN_USE_AOCL_MT \
        -I./install/include -I${AOCL_ROOT}/include \
        -Wno-parentheses my_app.cpp \
        -L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \
        -lpthread -lrt -lm -lomp \
        -o eigen_aocl_example
\endcode

Key compiler and linker flags:
- \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML)
- \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt)
- \c -lblis: Single-threaded BLIS library
- \c -lblis-mt: Multi-threaded BLIS library (recommended for performance)
- \c -lflame: FLAME LAPACK implementation  
- \c -lamdlibm: AMD LibM vector math library
- \c -lomp: OpenMP runtime for multi-threading support
- \c -lpthread -lrt: System threading and real-time libraries
- \c -Wno-parentheses: Suppress common warnings when using AOCL headers

\subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support

To build Eigen with AOCL Support, use the following CMake configuration:

\code
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_C_COMPILER=clang \
         -DCMAKE_CXX_COMPILER=clang++ \
         -DCMAKE_INSTALL_PREFIX=$PWD/install \
         -DINCLUDE_INSTALL_DIR=$PWD/install/include \
      && make install -j$(nproc)
\endcode


To build Eigen with AOCL integration, use the following CMake configuration:

\code
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_C_COMPILER=clang \
         -DCMAKE_CXX_COMPILER=clang++ \
         -DCMAKE_INSTALL_PREFIX=$PWD/install \
         -DINCLUDE_INSTALL_DIR=$PWD/install/include \
      && make install -j$(nproc)
\endcode

**CMake Configuration Parameters:**

<table class="manual">
<tr><th>Parameter</th><th>Expected Values</th><th>Description</th></tr>
<tr><td>\c CMAKE_BUILD_TYPE</td><td>\c Release, \c Debug, \c RelWithDebInfo</td><td>Build configuration (\c Release recommended for benchmarks)</td></tr>
<tr class="alt"><td>\c CMAKE_C_COMPILER</td><td>\c clang, \c gcc</td><td>C compiler (clang recommended for AOCL)</td></tr>
<tr><td>\c CMAKE_CXX_COMPILER</td><td>\c clang++, \c g++</td><td>C++ compiler (clang++ recommended for AOCL)</td></tr>
<tr class="alt"><td>\c CMAKE_INSTALL_PREFIX</td><td>Installation path</td><td>Where to install Eigen headers</td></tr>
<tr><td>\c INCLUDE_INSTALL_DIR</td><td>Header path</td><td>Specific path for Eigen headers</td></tr>
</table>

**Architecture Selection Guide:**
- \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series)
- \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series)  
- \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series)
- \c native: Auto-detect current CPU architecture
- \c generic: Generic x86-64 without specific optimizations

**Custom Compiler Flags Explanation:**
- \c -O3: Maximum optimization level
- \c -mavx512f: Enable AVX-512 instruction set (if supported)
- \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions

\subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark

After configuring Eigen, build the AOCL benchmark executable:

\code
cmake --build . --target benchmark_aocl -j$(nproc)
\endcode

This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations.

**Running the Benchmark:**
\code
./benchmark_aocl
\endcode

The benchmark will automatically compare:
- Eigen's native performance vs AOCL-accelerated operations
- Matrix multiplication performance (BLIS vs Eigen)
- Vector math functions performance (LibM vs Eigen)
- Memory bandwidth utilization and cache efficiency

\section TopicUsingAOCL_CMake CMake Integration

When using CMake, you can use a FindAOCL module:

\code
find_package(AOCL REQUIRED)
target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT)
target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM)
\endcode

\section TopicUsingAOCL_Troubleshooting Troubleshooting

Common issues and solutions:

- **Link errors**: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH
- **Performance not improved**: Verify you're using matrices/vectors larger than the threshold
- **Thread contention**: Set \c OMP_NUM_THREADS to match your CPU core count
- **Architecture mismatch**: Use appropriate \c -march flag for your AMD processor

\section TopicUsingAOCL_Links Links

- AMD AOCL can be downloaded for free <a href="https://www.amd.com/en/developer/aocl.html">here</a>
- AOCL User Guide and documentation available on the AMD Developer Portal
- AOCL is also available through package managers and containerized environments

*/

}
Aocl integration updated libeigen/eigen!1952 2025-11-24 17:20:42 +00:00			`/*`
			`Copyright (c) 2025, AMD Inc. All rights reserved.`
			`Redistribution and use in source and binary forms, with or without modification,`
			`are permitted provided that the following conditions are met:`
			`* Redistributions of source code must retain the above copyright notice, this`
			`list of conditions and the following disclaimer.`
			`* Redistributions in binary form must reproduce the above copyright notice,`
			`this list of conditions and the following disclaimer in the documentation`
			`and/or other materials provided with the distribution.`
			`* Neither the name of AMD nor the names of its contributors may`
			`be used to endorse or promote products derived from this software without`
			`specific prior written permission.`
			`THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND`
			`ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED`
			`WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE`
			`DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR`
			`ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES`
			`(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;`
			`LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON`
			`ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT`
			`(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS`
			`SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.`
			`********************************************************************************`
			`* Content : Documentation on the use of AMD AOCL through Eigen`
			`********************************************************************************`
			`*/`

			`namespace Eigen {`

			`/** \page TopicUsingAOCL Using AMD® AOCL from %Eigen`

			`Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later).`

			`<a href="https://www.amd.com/en/developer/aocl.html"> AMD AOCL </a> provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures.`

			`\note`
			`AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use.`

			`Using AMD AOCL through %Eigen is straightforward:`
			`-# export \c AOCL_ROOT into your environment`
			`-# define one of the AOCL macros before including any %Eigen headers (see table below)`
			`-# link your program to AOCL libraries (BLIS, FLAME, LibM)`
			`-# ensure your system supports the target architecture optimizations`

			`When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines.`
			`These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex<float>, and \c complex<double>.`
			`Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms.`

			`The AOCL integration targets three core components:`
			`- BLIS: High-performance BLAS implementation optimized for modern cache hierarchies`
			`- FLAME: Dense linear algebra algorithms providing LAPACK functionality`
			`- LibM: Optimized standard math routines with vectorized implementations`

			`\section TopicUsingAOCL_Macros Configuration Macros`

			`You can choose which parts will be substituted by defining one or multiple of the following macros:`

			`<table class="manual">`
			`<tr><td>\c EIGEN_USE_BLAS </td><td>Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)</td></tr>`
			`<tr class="alt"><td>\c EIGEN_USE_LAPACKE </td><td>Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)</td></tr>`
			`<tr><td>\c EIGEN_USE_LAPACKE_STRICT </td><td>Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.</td></tr>`
			`<tr class="alt"><td>\c EIGEN_USE_AOCL_VML </td><td>Enables the use of AOCL LibM vector math operations for coefficient-wise functions</td></tr>`
			`<tr><td>\c EIGEN_USE_AOCL_ALL </td><td>Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML</td></tr>`
			`<tr class="alt"><td>\c EIGEN_USE_AOCL_MT </td><td>Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.</td></tr>`
			`</table>`

			`\note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead.`

			`\section TopicUsingAOCL_Performance Performance Considerations`

			`The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations:`

			`- Multi-threading: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library`
			`- Architecture targeting: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5)`
			`- Vector Math Library: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously`
			`- Memory layout: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation`

			`\section TopicUsingAOCL_Types Supported Data Types and Sizes`

			`AOCL acceleration is applied to:`
			`- Scalar types: \c float, \c double, \c complex<float>, \c complex<double>`
			`- Matrix/Vector sizes: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD`
			`- Storage order: Both column-major (default) and row-major layouts`
			`- Memory alignment: Eigen's data pointers are directly compatible with AOCL function signatures`

			`The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float.`

			`\section TopicUsingAOCL_Functions Vector Math Functions`

			`The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:`

			`<table class="manual">`
			`<tr><th>Code example</th><th>AOCL routines</th></tr>`
			`<tr><td>\code`
			`v2 = v1.array().exp();`
			`v2 = v1.array().sin();`
			`v2 = v1.array().cos();`
			`v2 = v1.array().tan();`
			`v2 = v1.array().log();`
			`v2 = v1.array().log10();`
			`v2 = v1.array().log2();`
			`v2 = v1.array().sqrt();`
			`v2 = v1.array().pow(1.5);`
			`v2 = v1.array() + v2.array();`
			`\endcode</td><td>\code`
			`amd_vrda_exp`
			`amd_vrda_sin`
			`amd_vrda_cos`
			`amd_vrda_tan`
			`amd_vrda_log`
			`amd_vrda_log10`
			`amd_vrda_log2`
			`amd_vrda_sqrt`
			`amd_vrda_pow`
			`amd_vrda_add`
			`\endcode</td></tr>`
			`</table>`

			`In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD.`

			`\section TopicUsingAOCL_Example Complete Example`

			`\code`
			`#define EIGEN_USE_AOCL_MT`
			`#include <iostream>`
			`#include <Eigen/Dense>`

			`int main() {`
			`const int n = 2048;`

			`// Large matrices automatically use AOCL-BLIS for multiplication`
			`Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n);`
			`Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n);`
			`Eigen::MatrixXd C = A * B; // Dispatched to dgemm`

			`// Large vectors automatically use AOCL LibM for math functions`
			`Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10);`
			`Eigen::VectorXd result = v.array().sin(); // Dispatched to amd_vrda_sin`

			`// LAPACK decompositions use AOCL-FLAME`
			`Eigen::LLT<Eigen::MatrixXd> llt(A); // Dispatched to dpotrf`

			`std::cout << "Matrix norm: " << C.norm() << std::endl;`
			`std::cout << "Vector result norm: " << result.norm() << std::endl;`

			`return 0;`
			`}`
			`\endcode`

			`\section TopicUsingAOCL_Building Building and Linking`

			`To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries:`

			`\code`
			`export AOCL_ROOT=/path/to/aocl`
			`clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \`
			`-I./install/include -I${AOCL_ROOT}/include \`
			`-Wno-parentheses my_app.cpp \`
			`-L${AOCL_ROOT} -lamdlibm -lflame -lblis \`
			`-lpthread -lrt -lm -lomp \`
			`-o eigen_aocl_example`
			`\endcode`

			`For multi-threaded performance, use the multi-threaded BLIS library:`
			`\code`
			`clang++ -O3 -g -DEIGEN_USE_AOCL_MT \`
			`-I./install/include -I${AOCL_ROOT}/include \`
			`-Wno-parentheses my_app.cpp \`
			`-L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \`
			`-lpthread -lrt -lm -lomp \`
			`-o eigen_aocl_example`
			`\endcode`

			`Key compiler and linker flags:`
			`- \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML)`
			`- \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt)`
			`- \c -lblis: Single-threaded BLIS library`
			`- \c -lblis-mt: Multi-threaded BLIS library (recommended for performance)`
			`- \c -lflame: FLAME LAPACK implementation`
			`- \c -lamdlibm: AMD LibM vector math library`
			`- \c -lomp: OpenMP runtime for multi-threading support`
			`- \c -lpthread -lrt: System threading and real-time libraries`
			`- \c -Wno-parentheses: Suppress common warnings when using AOCL headers`

			`\subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support`

			`To build Eigen with AOCL Support, use the following CMake configuration:`

			`\code`
			`cmake .. -DCMAKE_BUILD_TYPE=Release \`
			`-DCMAKE_C_COMPILER=clang \`
			`-DCMAKE_CXX_COMPILER=clang++ \`
			`-DCMAKE_INSTALL_PREFIX=$PWD/install \`
			`-DINCLUDE_INSTALL_DIR=$PWD/install/include \`
			`&& make install -j$(nproc)`
			`\endcode`


Remove obsolete bench/ and btl/ directories libeigen/eigen!2217 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com> 2026-02-25 20:19:45 -08:00			`To build Eigen with AOCL integration, use the following CMake configuration:`
Aocl integration updated libeigen/eigen!1952 2025-11-24 17:20:42 +00:00
			`\code`
Remove obsolete bench/ and btl/ directories libeigen/eigen!2217 Co-authored-by: Rasmus Munk Larsen <rmlarsen@gmail.com> 2026-02-25 20:19:45 -08:00			`cmake .. -DCMAKE_BUILD_TYPE=Release \`
Aocl integration updated libeigen/eigen!1952 2025-11-24 17:20:42 +00:00			`-DCMAKE_C_COMPILER=clang \`
			`-DCMAKE_CXX_COMPILER=clang++ \`
			`-DCMAKE_INSTALL_PREFIX=$PWD/install \`
			`-DINCLUDE_INSTALL_DIR=$PWD/install/include \`
			`&& make install -j$(nproc)`
			`\endcode`

			`CMake Configuration Parameters:`

			`<table class="manual">`
			`<tr><th>Parameter</th><th>Expected Values</th><th>Description</th></tr>`
			`<tr><td>\c CMAKE_BUILD_TYPE</td><td>\c Release, \c Debug, \c RelWithDebInfo</td><td>Build configuration (\c Release recommended for benchmarks)</td></tr>`
			`<tr class="alt"><td>\c CMAKE_C_COMPILER</td><td>\c clang, \c gcc</td><td>C compiler (clang recommended for AOCL)</td></tr>`
			`<tr><td>\c CMAKE_CXX_COMPILER</td><td>\c clang++, \c g++</td><td>C++ compiler (clang++ recommended for AOCL)</td></tr>`
			`<tr class="alt"><td>\c CMAKE_INSTALL_PREFIX</td><td>Installation path</td><td>Where to install Eigen headers</td></tr>`
			`<tr><td>\c INCLUDE_INSTALL_DIR</td><td>Header path</td><td>Specific path for Eigen headers</td></tr>`
			`</table>`

			`Architecture Selection Guide:`
			`- \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series)`
			`- \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series)`
			`- \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series)`
			`- \c native: Auto-detect current CPU architecture`
			`- \c generic: Generic x86-64 without specific optimizations`

			`Custom Compiler Flags Explanation:`
			`- \c -O3: Maximum optimization level`
			`- \c -mavx512f: Enable AVX-512 instruction set (if supported)`
			`- \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions`

			`\subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark`

			`After configuring Eigen, build the AOCL benchmark executable:`

			`\code`
			`cmake --build . --target benchmark_aocl -j$(nproc)`
			`\endcode`

			`This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations.`

			`Running the Benchmark:`
			`\code`
			`./benchmark_aocl`
			`\endcode`

			`The benchmark will automatically compare:`
			`- Eigen's native performance vs AOCL-accelerated operations`
			`- Matrix multiplication performance (BLIS vs Eigen)`
			`- Vector math functions performance (LibM vs Eigen)`
			`- Memory bandwidth utilization and cache efficiency`

			`\section TopicUsingAOCL_CMake CMake Integration`

			`When using CMake, you can use a FindAOCL module:`

			`\code`
			`find_package(AOCL REQUIRED)`
			`target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT)`
			`target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM)`
			`\endcode`

			`\section TopicUsingAOCL_Troubleshooting Troubleshooting`

			`Common issues and solutions:`

			`- Link errors: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH`
			`- Performance not improved: Verify you're using matrices/vectors larger than the threshold`
			`- Thread contention: Set \c OMP_NUM_THREADS to match your CPU core count`
			`- Architecture mismatch: Use appropriate \c -march flag for your AMD processor`

			`\section TopicUsingAOCL_Links Links`

			`- AMD AOCL can be downloaded for free <a href="https://www.amd.com/en/developer/aocl.html">here</a>`
			`- AOCL User Guide and documentation available on the AMD Developer Portal`
			`- AOCL is also available through package managers and containerized environments`

			`*/`

			`}`