Files
ffmpeg/libswscale
Shreesh Adiga 26f2f03e0d swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following:
4 vinsertq to have interleaving of the vector lanes during load from memory.
4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout.

This patch replaces the above 8 instructions with 2 vpermq and
2 vpermd with a vector register similar to AVX512ICL version.

Observed the following numbers on various microarchitectures:

On AMD Zen3 laptop:
Before:
uyvytoyuv422_c:                                      51979.7 ( 1.00x)
uyvytoyuv422_sse2:                                    5410.5 ( 9.61x)
uyvytoyuv422_avx:                                     4642.7 (11.20x)
uyvytoyuv422_avx2:                                    4249.0 (12.23x)

After:
uyvytoyuv422_c:                                      51659.8 ( 1.00x)
uyvytoyuv422_sse2:                                    5420.8 ( 9.53x)
uyvytoyuv422_avx:                                     4651.2 (11.11x)
uyvytoyuv422_avx2:                                    3953.8 (13.07x)

On Intel Macbook Pro 2019:
Before:
uyvytoyuv422_c:                                     185014.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22800.4 ( 8.11x)
uyvytoyuv422_avx:                                    19796.9 ( 9.35x)
uyvytoyuv422_avx2:                                   13141.9 (14.08x)

After:
uyvytoyuv422_c:                                     185093.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22795.4 ( 8.12x)
uyvytoyuv422_avx:                                    19791.9 ( 9.35x)
uyvytoyuv422_avx2:                                   12043.1 (15.37x)

On AMD Zen4 desktop:
Before:
uyvytoyuv422_c:                                      29105.0 ( 1.00x)
uyvytoyuv422_sse2:                                    3888.0 ( 7.49x)
uyvytoyuv422_avx:                                     3374.2 ( 8.63x)
uyvytoyuv422_avx2:                                    2649.8 (10.98x)
uyvytoyuv422_avx512icl:                               1615.0 (18.02x)

After:
uyvytoyuv422_c:                                      29093.4 ( 1.00x)
uyvytoyuv422_sse2:                                    3874.4 ( 7.51x)
uyvytoyuv422_avx:                                     3371.6 ( 8.63x)
uyvytoyuv422_avx2:                                    2174.6 (13.38x)
uyvytoyuv422_avx512icl:                               1625.1 (17.90x)

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
2025-03-23 15:25:48 +00:00
..
2025-03-17 11:40:05 +01:00
2025-03-19 09:34:05 -03:00