mirror of
https://git.ffmpeg.org/ffmpeg.git
synced 2026-05-08 20:13:23 +02:00
4ea59d5665
Port lfe_fir0_float and lfe_fir1_float to AArch64 NEON. These polyphase FIR interpolation filters have an x86 SSE/AVX path but no AArch64 equivalent, falling back to scalar C. The inner loop computes two dot products per output pair. Precomputing a reversed LFE sample vector before the inner loop avoids per-iteration shuffle overhead. Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge): lfe_fir0_float: C 5902.0 cycles -> NEON 2135.0 cycles (2.77x) lfe_fir1_float: C 2836.3 cycles -> NEON 1527.8 cycles (1.86x) Measured with: taskset -c 0 ./tests/checkasm/checkasm --test=dcadsp --bench, 3-run average, Ubuntu 22.04 (kernel 6.8.0-1052-aws), perf_event_paranoid=0. Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>