Files
eigen/Eigen/src
Rasmus Munk Larsen 888d708dcd Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64
On ARM64 (and LoongArch64), the GEBP kernel uses nr=8, so the RHS is
packed in 8-column blocks. The half-packet and quarter-packet row
processing loops were iterating columns 4 at a time starting from j2=0,
misindexing into the 8-column packed RHS buffer. This produced
completely wrong results for float GEMM when the number of rows was
smaller than the SIMD packet size (e.g. 2x10 * 10x8 float).

Add the missing nr>=8 column iteration blocks to both loops, matching
the pattern already present in the 3x, 2x, 1x, and scalar remainder
sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-25 19:03:11 -08:00
..
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00