Rasmus Munk Larsen 888d708dcd Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64
On ARM64 (and LoongArch64), the GEBP kernel uses nr=8, so the RHS is
packed in 8-column blocks. The half-packet and quarter-packet row
processing loops were iterating columns 4 at a time starting from j2=0,
misindexing into the 8-column packed RHS buffer. This produced
completely wrong results for float GEMM when the number of rows was
smaller than the SIMD packet size (e.g. 2x10 * 10x8 float).

Add the missing nr>=8 column iteration blocks to both loops, matching
the pattern already present in the 3x, 2x, 1x, and scalar remainder
sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-25 19:03:11 -08:00
2024-03-11 19:08:30 +00:00
2022-02-07 17:30:31 +00:00
2026-01-03 00:57:18 +00:00

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

For more information go to http://eigen.tuxfamily.org/ or https://libeigen.gitlab.io.

For pull request, bug reports, and feature requests, go to https://gitlab.com/libeigen/eigen.

Description
No description provided
Readme MPL-2.0 119 MiB
Languages
C++ 85.6%
Fortran 8.9%
CMake 2%
C 1.6%
Cuda 1.2%
Other 0.6%