Commit Graph

2811 Commits

Author SHA1 Message Date
Niklas Haas
4ede75b5f4 swscale/graph: fix double-free when legacy pass fails initializing
If this function returns an error after ff_sws_graph_add_pass() has been
called, and the pass->free callback is therefore already set up to free the
context, the graph will end up freed twice: once by the pass->free callback
(during ff_sws_graph_free()), and once before that by failure path of the
caller (e.g. add_legacy_sws_pass(), or init_legacy_subpass() itself for
cascaded contexts.)

The solution is to redefine the ownership of SwsGraph to pass clearly from
the caller of add_legacy_sws_pass() to init_legacy_subpass(), which can then
deal with appropriately freeing the context conditional on whether or not the
pass was already registered in the pass list.

Reported-by: 김영민 <kunshim@naver.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
2025-08-29 13:22:03 +00:00
Michael Niedermayer
ca20d42cd7 swscale/swscale_internal: Use more precisse gamma
Avoids failure of xyz12 fate tests on mingw and linux x86-32

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-08-18 19:12:46 +00:00
Marton Balint
b61e510e75 swscale/swscale_unscaled: use 8 line alignment for planarCopyWrapper with dithering
Dithering relies on a 8 line dithering table and the code always uses it from
the beginning. So in order to make dithering independent from height of the
slices used we must enforce a 8 line alignment.

Fixes issue #20071.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-08-12 21:56:09 +00:00
Dash Santosh
ca2a88c1b3 swscale/output: Implement yuv2nv12cx neon assembly
yuv2nv12cX_2_512_accurate_c:                          3540.1 ( 1.00x)
yuv2nv12cX_2_512_accurate_neon:                        408.0 ( 8.68x)
yuv2nv12cX_2_512_approximate_c:                       3521.4 ( 1.00x)
yuv2nv12cX_2_512_approximate_neon:                     409.2 ( 8.61x)
yuv2nv12cX_4_512_accurate_c:                          4740.0 ( 1.00x)
yuv2nv12cX_4_512_accurate_neon:                        604.4 ( 7.84x)
yuv2nv12cX_4_512_approximate_c:                       4681.9 ( 1.00x)
yuv2nv12cX_4_512_approximate_neon:                     603.3 ( 7.76x)
yuv2nv12cX_8_512_accurate_c:                          7273.1 ( 1.00x)
yuv2nv12cX_8_512_accurate_neon:                       1012.2 ( 7.19x)
yuv2nv12cX_8_512_approximate_c:                       7223.0 ( 1.00x)
yuv2nv12cX_8_512_approximate_neon:                    1015.8 ( 7.11x)
yuv2nv12cX_16_512_accurate_c:                        13762.0 ( 1.00x)
yuv2nv12cX_16_512_accurate_neon:                      1761.4 ( 7.81x)
yuv2nv12cX_16_512_approximate_c:                     13884.0 ( 1.00x)
yuv2nv12cX_16_512_approximate_neon:                   1766.8 ( 7.86x)

Benchmarked on:
Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) CPU
3417 Mhz, 12 Core(s), 12 Logical Processor(s)
2025-08-12 09:05:00 +00:00
Logaprakash Ramajayam
49477972b7 swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template()
yuv2yuvX_8_2_0_512_accurate_c:                        2213.4 ( 1.00x)
yuv2yuvX_8_2_0_512_accurate_neon:                      147.5 (15.01x)
yuv2yuvX_8_2_0_512_approximate_c:                     2203.9 ( 1.00x)
yuv2yuvX_8_2_0_512_approximate_neon:                   154.1 (14.30x)
yuv2yuvX_8_2_16_512_accurate_c:                       2147.2 ( 1.00x)
yuv2yuvX_8_2_16_512_accurate_neon:                     150.8 (14.24x)
yuv2yuvX_8_2_16_512_approximate_c:                    2149.7 ( 1.00x)
yuv2yuvX_8_2_16_512_approximate_neon:                  146.8 (14.64x)
yuv2yuvX_8_2_32_512_accurate_c:                       2078.9 ( 1.00x)
yuv2yuvX_8_2_32_512_accurate_neon:                     139.0 (14.95x)
yuv2yuvX_8_2_32_512_approximate_c:                    2083.7 ( 1.00x)
yuv2yuvX_8_2_32_512_approximate_neon:                  140.5 (14.84x)
yuv2yuvX_8_2_48_512_accurate_c:                       2010.7 ( 1.00x)
yuv2yuvX_8_2_48_512_accurate_neon:                     138.2 (14.55x)
yuv2yuvX_8_2_48_512_approximate_c:                    2012.6 ( 1.00x)
yuv2yuvX_8_2_48_512_approximate_neon:                  141.2 (14.26x)
yuv2yuvX_10LE_16_0_512_accurate_c:                    7874.1 ( 1.00x)
yuv2yuvX_10LE_16_0_512_accurate_neon:                  831.6 ( 9.47x)
yuv2yuvX_10LE_16_0_512_approximate_c:                 7918.1 ( 1.00x)
yuv2yuvX_10LE_16_0_512_approximate_neon:               836.1 ( 9.47x)
yuv2yuvX_10LE_16_16_512_accurate_c:                   7630.9 ( 1.00x)
yuv2yuvX_10LE_16_16_512_accurate_neon:                 804.5 ( 9.49x)
yuv2yuvX_10LE_16_16_512_approximate_c:                7724.7 ( 1.00x)
yuv2yuvX_10LE_16_16_512_approximate_neon:              808.6 ( 9.55x)
yuv2yuvX_10LE_16_32_512_accurate_c:                   7436.4 ( 1.00x)
yuv2yuvX_10LE_16_32_512_accurate_neon:                 780.4 ( 9.53x)
yuv2yuvX_10LE_16_32_512_approximate_c:                7366.7 ( 1.00x)
yuv2yuvX_10LE_16_32_512_approximate_neon:              780.5 ( 9.44x)
yuv2yuvX_10LE_16_48_512_accurate_c:                   7099.9 ( 1.00x)
yuv2yuvX_10LE_16_48_512_accurate_neon:                 761.0 ( 9.33x)
yuv2yuvX_10LE_16_48_512_approximate_c:                7097.6 ( 1.00x)
yuv2yuvX_10LE_16_48_512_approximate_neon:              754.6 ( 9.41x)

Benchmarked on:
Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) CPU
3417 Mhz, 12 Core(s), 12 Logical Processor(s)
2025-08-12 09:05:00 +00:00
Kacper Michajłow
98c4b9dbbd swscale/input: don't generate unused functions
Fixes: input.c:1271:1: warning: unused function 'planar_rgb16_s12_to_a'
Fixes: input.c:1272:1: warning: unused function 'planar_rgb16_s10_to_a'

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-08-11 19:29:53 +00:00
Michael Niedermayer
638b521c7b Bump versions for master after release/8.0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-08-09 18:03:05 +02:00
Michael Niedermayer
7eaa0f799a Bump versions for release/8.0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-08-09 17:30:39 +02:00
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
Michael Niedermayer
aca41d3d93 swscale/output: Fix all bilinear integer overflows
Ticket11686 hinted at one of these overflows
this fixes them all

Issue in line 1325/1326 found by HAORAN FANG <xfanghaoran@gmail.com>

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-08-02 16:26:33 +00:00
Michael Niedermayer
c44d237d80 swscale/output: Fix integer overflow with lum/chr/alpha filter
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-08-02 16:26:33 +00:00
Niklas Haas
b7946098b1 swscale/alphablend: don't overread alpha plane on subsampled odd size
This function overreads the input plane for odd dimensions, because the
chroma plane is always rounded up, which means (xy << subsample) + 1 exceeds
the actual alpha plane size.

To verify:
  valgrind ffmpeg -pix_fmt yuva420p -f lavfi -i color -vf \
  "scale=1x1,format=yuva420p,scale=alphablend=uniform_color,format=yuv420p \
  -vframes 1 -f null -

Fixes: https://trac.ffmpeg.org/ticket/11692
2025-07-31 11:32:20 +00:00
Kacper Michajłow
22da57c444 swscale/lut3d: remove unused function
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-07-22 19:56:34 +02:00
James Almer
11032d819d swscale/swscale_unscaled: don't add offsets to more NULL pointers
Continuation of af9b43455a.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 21:35:26 -03:00
James Almer
af9b43455a swscale/swscale_unscaled: don't add offsets to NULL pointers
Fixes: libswscale/swscale_unscaled.c:916:20: runtime error: applying zero offset to null pointer
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 14:23:10 -03:00
Timo Rothenpieler
02a7c85753 swscale: add support for new 10/12 bit MSB formats 2025-07-11 17:49:58 +02:00
Michael Niedermayer
38ead08815 swscale/output: Fix integer overflows in yuv2rgba64_1_c_template()
Fixes: signed integer overflow: -132524 * 16525 cannot be represented in type 'int'
Fixes: 414862270/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4869083202125824

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-07-06 19:24:07 +02:00
Andreas Rheinhardt
54c865fbec swscale/utils: Fix potential race when initializing xyz tables
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-05-27 13:49:26 +02:00
Ramiro Polla
d028cf03b8 swscale/swscale_unscaled: fix planarRgbToplanarRgbWrapper() for formats with bpc between 9-14 bits
Currently, planarRgbToplanarRgbWrapper() always sets the alpha value to 255,
without taking the bit depth into consideration.

This commit restricts the alpha value to the bit depth.
2025-05-23 00:07:56 +02:00
Ramiro Polla
748e960e04 swscale/swscale_unscaled: fix packed16togbra16() for formats with bpc between 9-14 bits
Currently, packed16togbra16() always sets the alpha value to 0xFFFF,
without taking the bit depth into consideration.

This causes a bug on x86, which can be reproduced with:
./libswscale/tests/swscale -unscaled 1 -src xyz12le -dst gbrap12be

The problem arises in ff_hscale14to15_4_ssse3(), in the conversion
from gbrap12be to yuva444p, which comes after the conversion from
xyz12le to gbrap12be.

It has something to do with pmaddwd not working on unsigned values.
There is some code to deal with 0xFFFF if the input has a bit depth of
16, but not for bit depths < 16.
We could fix ff_hscale14to15_4_ssse3() to also work correctly with
0xFFFF on bit depths < 16, or we could just not write 0xFFFF there in
the first place, which is what this commit does.
2025-05-23 00:01:04 +02:00
Ramiro Polla
0c1d87d1e6 swscale/swscale_unscaled: fix packed30togbra10() for formats with bpc between 9-14 bits
Currently, packed30togbra10() always sets the alpha value to 0xFFFF,
without taking the bit depth into consideration.

This commit restricts the alpha value to the bit depth.
2025-05-23 00:00:05 +02:00
Ramiro Polla
a16c053a33 swscale/swscale_unscaled: fix planarCopyWrapper() for yuv444p => yuva444p
Currently, planarCopyWrapper() assumes that src[3] must be NULL when
the source format has no alpha plane.

This commit updates the condition for filling the alpha plane based on
the number of components available in the source format as well.
2025-05-22 23:59:39 +02:00
Niklas Haas
6072e27e9a swscale/graph: prefer bools to ints
This is more consistent with the rest of the newly added code, which
universally switched to using bools for boolean values.
2025-05-18 15:00:45 +02:00
Niklas Haas
d95944786e swscale/graph: move vshift() and shift_img() to shared header
I need to reuse these inside `ops.c`.
2025-05-18 14:39:57 +02:00
Niklas Haas
bc9696bff8 swscale/graph: make noop loop more robust
The current loop only works if the input and output have the same number
of planes. However, with the new scaling logic, we can also optimize into a
noop the case where the input has extra unneeded planes.

For the memcpy fallback to work in these cases we have to instead check if
the *output* pointer is set, rather than the input pointer.
2025-05-18 14:37:33 +02:00
Niklas Haas
51e912466f swscale/graph: expose ff_sws_graph_add_pass
So we can move pass-adding business logic outside of graph.c.
2025-05-18 14:37:33 +02:00
Niklas Haas
f297ebf97a tests/swscale: improve colorization of speedup
The old limits were a bit too tightly clustered around 1.0. Make the
value range much more generous, and also introduce a new highlight
for speedups above 10.0 (order of magnitude improvement).
2025-05-18 14:37:33 +02:00
Michael Niedermayer
23592f942d swscale/output: fix integer overflow in yuv2rgba64_full_1_c_template()
Fixes: signed integer overflow: -293650 * 16525 cannot be represented in type 'int'
Fixes: 408304111/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4762210299871232

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-05-15 03:03:58 +02:00
Andreas Rheinhardt
35fcdb2132 swscale/x86/rgb2rgb: Deduplicate ASM constants
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-04-13 22:49:21 +02:00
Michael Niedermayer
d16a058dbc swscale/swscale: Do not crash on floats
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: division by zero
Fixes: 391981061/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6691017763389440
Fixes: 392929028/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5142088307507200

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-04-10 03:01:32 +02:00
Michael Niedermayer
ce538ef97a swscale/output: Fix integer overflow in yuv2gbrp_full_X_c()
Fixes: signed integer overflow: 1966895953 + 210305024 cannot be represented in type 'int'
Fixes: 391921975/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5916798905548800

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-04-10 03:01:32 +02:00
Andreas Rheinhardt
435be31ef5 swscale/csputils: Remove unused ff_sws_matrix3x3_rmul()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-04-03 06:04:57 +02:00
Andreas Rheinhardt
4da84d5c2b swscale/swscale_unscaled: Actually use X2->RGBA64 conversions
The conversion functions were added in
e7382b4d01, yet they were never
really enabled. Found via -ffunction-sections and --gc-sections.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-03-31 21:45:20 +02:00
Niklas Haas
3e32dc8b08 tests/swscale: allow setting log verbosity
Helpful for debugging the new swscale code, since it dumps the
operations list in verbose logging mode.
2025-03-31 12:19:26 +02:00
Niklas Haas
92a57f1cfd tests/swscale: constrain reference SSIM for low bit depth formats
Sometimes, the reference SSIM is significantly higher than the
SSIM level expected for the test. This is the case when the source format
has a much lower bit depth than the destination format. In this case, the fact
that legacy swscale does not accurately preserve the source dither pattern
gives it an unfair advantage in a direct comparison, leading to false
positives.

For example, conversion like rgb4 -> rgb565 should be lossless, but swscale
low passes / downscales the input chroma, throwing away massive amounts of
detail. This gives it a higher SSIM score since the lowpassed result removes
some of the dither noise that was present in the source.
2025-03-31 12:19:26 +02:00
Niklas Haas
8fc9808f18 tests/swscale: calculate theoretical expected SSIM
We can calculate with some confidence the theoretical expected SSIM
from an "ideal" conversion, by computing the reference SSIM level
for an image dithered with uniformly distributed quatization noise.

This gives us an additional safety net to check for regressions even in
the absence of a reference to compare against.
2025-03-31 12:19:26 +02:00
Niklas Haas
9549daa996 tests/swscale: remove stray whitespace in scanf format 2025-03-31 12:19:24 +02:00
Niklas Haas
a22faeb992 tests/swscale: check supported inputs for legacy swscale separately
The new code path supports more formats, so we can't test them all
against the legacy implementation.
2025-03-31 12:19:08 +02:00
Niklas Haas
e1736d0d0b tests/swscale: print performance stats on exit 2025-03-31 12:19:08 +02:00
Niklas Haas
6c12b1535a tests/swscale: switch from MSE to SSIM
And bias it towards Y. This is much better at ignoring errors due to differing
dither patterns, and rewards algorithms that lower luma noise at the cost of
higher chroma noise.

The (0.8, 0.1, 0.1) weights for YCbCr are taken from the paper:
  "Understanding SSIM" by Jim Nilsson and Tomas Akenine-Möller
  (https://arxiv.org/abs/2006.13846)
2025-03-31 12:19:07 +02:00
Niklas Haas
1707e81073 tests/swscale: use yuva444p as reference
Instead of the lossy yuva420p. This does change the results compared to the
status quo, but is more reflective of the actual strength of a conversion,
since it will faithfully measure the round-trip error from subsampling and
upsampling.
2025-03-31 12:18:35 +02:00
Niklas Haas
f438f3f8cd tests/swscale: print speedup numbers in color 2025-03-31 12:18:35 +02:00
Niklas Haas
995986e512 tests/swscale: allow testing only unscaled convertors
I need this to be able to test the new unscaled conversion code more quickly.
We re-order the flags order to make 0 the first entry, so we don't set any
flags when performing unscaled tests.
2025-03-31 12:18:35 +02:00
Niklas Haas
d467ceaa9b tests/swscale: use hex format for flags values 2025-03-31 12:18:11 +02:00
Niklas Haas
0e2742a693 tests/swscale: allow choosing specific flags and dither mode
So I can quickly iterate on the new swscale code.
2025-03-31 12:16:10 +02:00
James Almer
b338d1b35b libs: bump major version for all libraries
Signed-off-by: James Almer <jamrial@gmail.com>
2025-03-28 14:44:34 -03:00
Shreesh Adiga
26f2f03e0d swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following:
4 vinsertq to have interleaving of the vector lanes during load from memory.
4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout.

This patch replaces the above 8 instructions with 2 vpermq and
2 vpermd with a vector register similar to AVX512ICL version.

Observed the following numbers on various microarchitectures:

On AMD Zen3 laptop:
Before:
uyvytoyuv422_c:                                      51979.7 ( 1.00x)
uyvytoyuv422_sse2:                                    5410.5 ( 9.61x)
uyvytoyuv422_avx:                                     4642.7 (11.20x)
uyvytoyuv422_avx2:                                    4249.0 (12.23x)

After:
uyvytoyuv422_c:                                      51659.8 ( 1.00x)
uyvytoyuv422_sse2:                                    5420.8 ( 9.53x)
uyvytoyuv422_avx:                                     4651.2 (11.11x)
uyvytoyuv422_avx2:                                    3953.8 (13.07x)

On Intel Macbook Pro 2019:
Before:
uyvytoyuv422_c:                                     185014.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22800.4 ( 8.11x)
uyvytoyuv422_avx:                                    19796.9 ( 9.35x)
uyvytoyuv422_avx2:                                   13141.9 (14.08x)

After:
uyvytoyuv422_c:                                     185093.4 ( 1.00x)
uyvytoyuv422_sse2:                                   22795.4 ( 8.12x)
uyvytoyuv422_avx:                                    19791.9 ( 9.35x)
uyvytoyuv422_avx2:                                   12043.1 (15.37x)

On AMD Zen4 desktop:
Before:
uyvytoyuv422_c:                                      29105.0 ( 1.00x)
uyvytoyuv422_sse2:                                    3888.0 ( 7.49x)
uyvytoyuv422_avx:                                     3374.2 ( 8.63x)
uyvytoyuv422_avx2:                                    2649.8 (10.98x)
uyvytoyuv422_avx512icl:                               1615.0 (18.02x)

After:
uyvytoyuv422_c:                                      29093.4 ( 1.00x)
uyvytoyuv422_sse2:                                    3874.4 ( 7.51x)
uyvytoyuv422_avx:                                     3371.6 ( 8.63x)
uyvytoyuv422_avx2:                                    2174.6 (13.38x)
uyvytoyuv422_avx512icl:                               1625.1 (17.90x)

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
2025-03-23 15:25:48 +00:00
Andreas Rheinhardt
c94143350f avutil/libm: Only include intfloat.h when needed
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-03-22 03:35:28 +01:00
Andreas Rheinhardt
65154ba994 swscale/tests/swscale: Fix potential buffer overflow
The field width in a %s directive gives the amount of characters
to read from the input and not the size of the receiving buffer;
the latter must be of course also have space for the trailing \0
which has been forgotten here. The commit adds it (and fixes a
-Wfortify-source warning from Clang).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-03-21 04:30:09 +01:00
Andreas Rheinhardt
dff498fddf avutil/csp: Improve enum range comparisons
The underlying integer type of an enumeration is
implementation-defined (see C11, 6.7.2.2 (4)); GCC defaults
to unsigned if there are no negative values like for all enums
from pixfmt.h except enum AVPixelFormat.

This means that tests like "if (csp >= AVCOL_SPC_NB)" for
invalid colorspaces need not work as expected (namely if
enum AVColorSpace is signed). It also means that testing
for such an enum variable to be >= 0 may be tautologically
true. Clang emits a -Wtautological-unsigned-enum-zero-compare
warning for this.

Fix both of these issues by casting to unsigned.
Also do the same in libswscale/format.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-03-21 04:30:09 +01:00