Files
ffmpeg/libavcodec/aarch64
Zhao Zhili 200914853d aarch64/sbrdsp: unroll sum64x5 to 16 floats/iter
The C version is faster than the previous asm with clang and gcc > 12 on
rpi5, since compiler basically does the same unroll.

sum64x5_neon:             before          after
  Cortex-A76 (gcc 12.4):  72.3 (3.63x)    47.4 (5.56x)
  Cortex-A76 (gcc 14.2):  72.3 (0.69x)    47.4 (1.05x)
  Apple M1 (clang 16):     0.2 (0.98x)     0.2 (0.99x)

Signed-off-by: Zhao Zhili <quinkblack@foxmail.com>
2026-06-03 10:40:20 +00:00
..