mirror of
https://gitlab.com/libeigen/eigen.git
synced 2026-04-10 11:34:33 +08:00
888d708dcdae93934c7519781fdc7a8410c6944c
On ARM64 (and LoongArch64), the GEBP kernel uses nr=8, so the RHS is packed in 8-column blocks. The half-packet and quarter-packet row processing loops were iterating columns 4 at a time starting from j2=0, misindexing into the 8-column packed RHS buffer. This produced completely wrong results for float GEMM when the number of rows was smaller than the SIMD packet size (e.g. 2x10 * 10x8 float). Add the missing nr>=8 column iteration blocks to both loops, matching the pattern already present in the 3x, 2x, 1x, and scalar remainder sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
For more information go to http://eigen.tuxfamily.org/ or https://libeigen.gitlab.io.
For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.
Languages
C++
85.6%
Fortran
8.9%
CMake
2%
C
1.6%
Cuda
1.2%
Other
0.6%