Files
eigen/Eigen
Rasmus Munk Larsen 888d708dcd Fix GEBP half/quarter-packet loops for nr>=8 RHS packing on ARM64
On ARM64 (and LoongArch64), the GEBP kernel uses nr=8, so the RHS is
packed in 8-column blocks. The half-packet and quarter-packet row
processing loops were iterating columns 4 at a time starting from j2=0,
misindexing into the 8-column packed RHS buffer. This produced
completely wrong results for float GEMM when the number of rows was
smaller than the SIMD packet size (e.g. 2x10 * 10x8 float).

Add the missing nr>=8 column iteration blocks to both loops, matching
the pattern already present in the 3x, 2x, 1x, and scalar remainder
sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-25 19:03:11 -08:00
..
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2025-10-20 21:09:53 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2024-07-30 22:15:49 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2023-11-29 11:12:48 +00:00
2025-10-01 22:58:44 +00:00