Files
ffmpeg/libavcodec
Andreas Rheinhardt aeb138679a avcodec/x86/mpegvideoencdsp: Port add_8x8basis_ssse3() to ASM
Both GCC and Clang completely unroll the unlikely loop at -O3,
leading to codesize bloat; their code is also suboptimal, as they
don't make use of pmulhrsw (even with -mssse3). This commit
therefore ports the whole function to external assembly. The new
function occupies 176B here vs 1406B for GCC.

Benchmarks for a testcase with huge qscale (notice that the C version
is unrolled just like the unlikely loop in the SSSE3 version):
add_8x8basis_c:                                         43.4 ( 1.00x)
add_8x8basis_ssse3 (old):                               43.6 ( 1.00x)
add_8x8basis_ssse3 (new):                               11.9 ( 3.63x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
..
2025-11-05 15:13:54 +00:00
2025-11-05 16:31:59 +00:00
2025-09-22 23:46:29 +00:00
2025-10-08 20:40:08 +02:00
2025-11-08 18:48:54 +01:00
2025-11-08 01:17:46 +01:00
2025-10-30 03:41:24 +01:00
2025-10-28 07:11:26 +01:00
2025-11-09 02:42:17 +01:00
2025-11-10 01:46:52 +00:00
2025-11-04 10:28:57 +00:00
2025-09-26 06:20:30 +02:00
2025-09-22 23:46:29 +00:00