Commit Graph

12057 Commits

Author SHA1 Message Date
James Almer
3f58c9df14 avfilter/x86/vf_bwdif: use the correct preprocessor check
Signed-off-by: James Almer <jamrial@gmail.com>
2025-08-03 19:26:18 -03:00
Niklas Haas
7f00e24d70 vf_bwdif: add AVX512 implementation
I also tried replacing some of the instructions by more elaborate ones
using masks, but I found no performance gain significant enough to be worth
maintaining two code paths, so this implementation merely replaces the AVX2
implementation by drop-in AVX512 equivalents.

bwdif8_c:                                             6362.2 ( 1.00x)
bwdif8_sse2:                                          1004.9 ( 6.33x)
bwdif8_ssse3:                                          946.0 ( 6.73x)
bwdif8_avx2:                                           477.9 (13.31x)
bwdif8_avx512:                                         273.3 (23.28x)

bwdif10_c:                                            6341.5 ( 1.00x)
bwdif10_sse2:                                          872.4 ( 7.27x)
bwdif10_ssse3:                                         803.4 ( 7.89x)
bwdif10_avx2:                                          416.7 (15.22x)
bwdif10_avx512:                                        224.3 (28.27x)

Realtime test at 3840x2160 yuv420p:

avx2:   frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93
avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93

The use of this function is gated behind avx512icl so that it doesn't
downclock on Skylake.
2025-08-03 22:13:51 +00:00
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
Timo Rothenpieler
8d439b2483 all: fix whitespace/new-line issues 2025-08-03 13:48:47 +02:00
Timo Rothenpieler
c6b3aae0ee avfilter/vf_scale_d3d11: remove unused variable 2025-08-01 20:49:57 +02:00
Dash Santosh
bd18a6a9e0 avfilter/scale_d3d11: cleanup return path using fail label 2025-07-31 21:07:51 +00:00
Dash Santosh
96821211c2 avfilter: add scale_d3d11 filter
This commit introduces a new hardware-accelerated video filter, scale_d3d11,
which performs scaling and format conversion using Direct3D 11. The filter enables
efficient GPU-based scaling and pixel format conversion (p010 to nv12), reducing
CPU overhead and latency in video pipelines.
2025-07-31 21:07:51 +00:00
Niklas Haas
03b9180fe3 avfilter/avfiltergraph: add logging for filter formats
There is no convenient way, from the command line, to figure out which
formats a filter actually supports. This commit changes that by adding a
log output, at debug level, to simply print the list of formats each filter
advertises on its links, before any negotiation.

Furthermore, we can use the exact same helper function to also print out the
corresponding filter links when there is an error during format negotiation.

We need to use AV_BRINT_SIZE_UNLIMITED because the default format list for
filters like vf_scale is about 1700 characters long, significantly larger than
the the 1 kB default buffer.
2025-07-31 12:35:32 +00:00
Zhao Zhili
2a49d05d1a avfilter/vf_vibrance: Update default value of rlum/blum
Fix #9195

It looks like vf_vibrance.c is similar to
https://github.com/zachsaw/RenderScripts/blob/master/RenderScripts/ImageProcessingShaders/SweetFX/Vibrance.hlsl
and
https://github.com/kevinlekiller/kwin-effect-shaders_shaders/blob/main/Vibrance.frag
Originall written by Christian Cann Schuldt Jensen ~ CeeJay.dk.

They use same matrix coeff.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2025-07-31 19:51:23 +08:00
Marton Balint
0cc46f1f59 avfilter/af_afade: rework crossfade activate logic
The new logic should be easier to follow.

It also uses ff_inlink_consume_frame() for all simple passthrough operations
making custom get_audio_buffer callback unnecessary.

Fate changes are because the new logic does not repacketize input audio up
until the crossfade. Content is the same.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-29 22:10:05 +02:00
Marton Balint
84d831ec58 avfilter/af_afade: fix check_input for empty streams
Use ff_outlink_get_status directly to get pending EOF state.

Fixes assertion failure with:
ffmpeg -lavfi "sine=f=1000:d=2[a];sine=f=440:d=2,atrim=end=0[b];[a][b]acrossfade=d=1" -f framecrc -
ffmpeg -lavfi "sine=f=1000:d=2,atrim=end=0[a];sine=f=440:d=2[b];[a][b]acrossfade=d=1" -f framecrc -

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-29 22:10:05 +02:00
Marton Balint
4be21b9399 avfilter/af_afade: factorize functions generating frames
No change in functionality.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-29 22:10:05 +02:00
Marton Balint
944329f8fd avfilter/trim: consume all available frames and avoid activate reschedule
There is no benefit in delaying processing all available frames.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-29 22:10:05 +02:00
Niklas Haas
dc8e753f32 avfilter/vf_libplacebo: composite multiple inputs in linear light
This gives vastly improved blending results than when blending directly in
the desired output colorspace. Overridable by the existing "disable_linear"
option.

This is functionally similar to combining multiple "libplacebo" filters,
but does not rely on the existence of a Vulkan filter link, so it can be used
without performance penalty in all circumstances. It's also enabled by
default, without requiring special action from the user.
2025-07-28 10:59:48 +02:00
Niklas Haas
603334a043 avfilter/vf_premultiply: use correct premultiplication formula
The previous formula was introduced without justification in 6e713841e8,
and the only thing Paul had to say about it over IRC was that it was copied
from an unspecified source on the internet.

I decided to do some testing and came to the conclusion that this term not
only produces "illegal" files, but also lowers PSNR score, over the naive
implementation without this extra term.

Here are the results of a round-trip test, using allrgb/allyuv (respectively)
as the input, and fade=alpha=yes:n=256 to cycle through every possible alpha
value, comparing the round-trip output against the input:

Before patch:
  PSNR r:26.677431 g:26.677431 b:26.677431 a:inf average:27.926818 min:6.012093 max:55.400791
  PSNR y:26.677431 u:21.101981 v:21.101981 a:inf average:23.548981 min:9.013835 max:53.182303 (full)
  PSNR y:27.348055 u:21.101981 v:21.101981 a:inf average:23.625238 min:9.554991 max:45.652221 (limited)

After patch:
  PSNR r:27.321996 g:27.321996 b:27.321996 a:inf average:28.571384 min:6.012093 max:52.424553
  PSNR y:27.321996 u:23.187879 v:23.187879 a:inf average:25.431773 min:9.013835 max:50.199232 (full)
  PSNR y:27.868544 u:23.187879 v:23.187879 a:inf average:25.515660 min:9.554991 max:45.078298 (limited)

It's worth pointing out that previous version sometimes artificially inflates
PSNR by producing values that are too high (i.e. RGB > A), such as for the
input pair (R = 255, A = 2) which should give R = 2, but actually gives R = 3
under the old logic.

As a second evaluation without this shortcoming, here is a comparison against
the reference value computed with a floating point format:

Before patch:
  PSNR r:53.600599 g:53.957833 b:53.540948 a:inf average:54.945316 min:50.508901 max:inf (premul only)
  PSNR r:30.734183 g:30.734183 b:30.734183 a:inf average:31.983570 min:12.058264 max:inf (round-trip)

After patch:
  PSNR r:61.751104 g:65.239091 b:61.339191 a:inf average:63.710714 min:55.441130 max:inf (premul only)
  PSNR r:32.611851 g:32.611851 b:32.611851 a:inf average:33.861238 min:12.058264 max:inf (round-trip)
2025-07-28 10:56:10 +02:00
James Almer
45810daf4d avfilter/af_channelmap: always set out_channel in the map
Fixes use-of-uninitialized-value.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-25 00:18:07 -03:00
James Almer
da18c2a373 avfilter: use the getters for xGA font data arrays
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-22 09:47:18 -03:00
James Almer
b0159af6bc avfilter/f_metadata: use the return value of vsnprintf() to write the argument list
Should fix use-of-uninitialized-value under MSAN.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-22 09:47:18 -03:00
James Almer
a01dc3aa27 avfilter/x86/vf_colordetect: add missing preprocessor checks
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 18:03:22 -03:00
James Almer
c62813a057 avfilter/x86/vf_colordetect: make the AVX512 functions run only on ICL targets or newer
For detect_range, the usage of vpbroadcast{b,w} requires the AVX512BW extension, and for
detect_alpha we don't want ZMM instructions downclocking old CPUs.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 17:25:28 -03:00
James Almer
550ec9b7e6 avfilter/version: bump version after vf_colordetect addition
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 17:04:45 -03:00
James Almer
70fc4e5909 avfilter/x86/vf_colordetect_init: don't enable ASM functions on targets where it's known they will be slower
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:58:51 -03:00
James Almer
fdca209f1f avfilter/x86/vf_colordetect: don't use rax to return a 32bit integer
Fixes compilation on x86_32 targets

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:58:36 -03:00
James Almer
14f4478354 avfilter/x86/vf_colordetect: fix use of AVX512 instruction in AVX2 function on non Unix64 targets
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:52:46 -03:00
Niklas Haas
8b647b3f8a avfilter/vf_colordetect: add x86 SIMD implementation
alphadetect8_full_c:                                  5658.2 ( 1.00x)
alphadetect8_full_avx2:                                215.1 (26.31x)
alphadetect8_full_avx512:                              133.5 (42.40x)
alphadetect8_limited_c:                               7391.5 ( 1.00x)
alphadetect8_limited_avx2:                             649.3 (11.38x)
alphadetect8_limited_avx512:                           330.5 (22.36x)
alphadetect16_full_c:                                 3027.4 ( 1.00x)
alphadetect16_full_avx2:                               209.4 (14.46x)
alphadetect16_full_avx512:                             141.4 (21.41x)
alphadetect16_limited_c:                              3880.9 ( 1.00x)
alphadetect16_limited_avx2:                            734.9 ( 5.28x)
alphadetect16_limited_avx512:                          349.2 (11.11x)
rangedetect8_c:                                       5854.2 ( 1.00x)
rangedetect8_avx2:                                     138.9 (42.15x)
rangedetect8_avx512:                                   106.2 (55.12x)
rangedetect16_c:                                      4122.0 ( 1.00x)
rangedetect16_avx2:                                    138.6 (29.74x)
rangedetect16_avx512:                                  104.1 (39.60x)
2025-07-21 18:10:25 +02:00
Niklas Haas
545f721b44 avfilter/vf_colordetect: add new color range detection filter
This filter can detect various properties about the image, including
whether or not there are out-of-range values, or whether the input appears
to use straight or premultiplied alpha.

Of course, these can only be heuristics, with "undetermined" as the base
case. While we can definitely prove the existence of full range or
straight alpha colors, we can never infer the opposite.
2025-07-21 18:10:25 +02:00
James Almer
722a2170e8 avfilter/vf_curves: don't add offsets to NULL pointers
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-19 00:07:45 -03:00
Kacper Michajłow
6302ff1fd9 avfilter/vaf_spectrumsynth: don't use uninitialized variable as scale
scale was never initialized. av_tx_init() will use default scale if we
pass NULL.

Fixes: b3117f376d
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-07-19 00:36:25 +02:00
James Almer
85f2911891 avfilter/x86/vf_blackdetect: add missing preprocessor check
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 15:17:02 -03:00
James Almer
ee4ff3f706 avfilter/x86/vf_blackdetect_init: don't enable the ASM functions on targets where it's known they will be slower
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 13:05:44 -03:00
James Almer
f263192f0e avfilter/x86/vf_blackdetect: don't use rax to return a 32bit integer
Fixes compilation on x86_32.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 13:05:44 -03:00
Zhao Zhili
a218cafe4d avfilter/vf_blackdetect: Fix header guard
Fix fate-source failure.
2025-07-18 13:44:51 +02:00
Niklas Haas
75cd42c48a avfilter/vf_blackdetect: add AVX2 SIMD version
Requested by a user. Even with autovectorization enabled, the compiler
performs a quite poor job of optimizing this function, due to not being
able to take advantage of the pmaxub + pcmpeqb trick for counting the number
of pixels less than or equal-to a threshold.

blackdetect8_c:                                       4625.0 ( 1.00x)
blackdetect8_avx2:                                     155.1 (29.83x)
blackdetect16_c:                                      2529.4 ( 1.00x)
blackdetect16_avx2:                                    163.6 (15.46x)
2025-07-18 10:47:31 +02:00
Niklas Haas
bc8d06d541 avfilter/vf_thumbnail: unroll and use multiple histograms
This naive hist[p[x]]++ loop suffers badly when there are large regions of
identical values in the image, because of store-to-load forwarding delay.

Splitting up the histogram into four "parallel" histograms and processing
them one at a time speeds things up significantly, about 40% on my end.
2025-07-17 12:33:59 +02:00
Niklas Haas
e44a1aaeec avfilter/x86/scene_sad: add high bit depth AVX2/AVX512 version
Since psadbw only exists for 8-bits, we have to emulate it for 16-bit
inputs. The simplest sequence is to use a normal subtraction, which is safe
as long as the inputs do not exceed 32767 - so limit this implementation
to 15-bit inputs and below.

For 16-bit inputs, we could in theory instead use a pminw / pmaxw to ensure
the resulting difference does not overflow, but this is slower, and also
breaks the subsequent use of pmaddwd, so I opted to skip 16-bit SIMD for
now.

scene_sad10_c:                                      114175.6 ( 1.00x)
scene_sad10_avx2:                                     9617.7 (11.87x)
scene_sad10_avx512:                                   5208.8 (21.92x)
scene_sad12_c:                                      114537.8 ( 1.00x)
scene_sad12_avx2:                                     9614.0 (11.91x)
scene_sad12_avx512:                                   5186.3 (22.08x)
scene_sad14_c:                                      114113.9 ( 1.00x)
scene_sad14_avx2:                                     9612.9 (11.87x)
scene_sad14_avx512:                                   5186.0 (22.00x)
scene_sad15_c:                                      114108.9 ( 1.00x)
scene_sad15_avx2:                                     9612.3 (11.87x)
scene_sad15_avx512:                                   5186.4 (22.00x)
scene_sad16_c:                                      114136.0 ( 1.00x)
2025-07-17 12:26:06 +02:00
Niklas Haas
91f2d146d4 avfilter/x86/scene_sad: add AVX512 implementation
Trivial to add, but a lot faster (on my machine).

scene_sad8_c:                                       114476.4 ( 1.00x)
scene_sad8_sse2:                                      8644.3 (13.24x)
scene_sad8_avx2:                                      4520.1 (25.33x)
scene_sad8_avx512:                                    3153.0 (36.31x)
2025-07-17 12:26:06 +02:00
Niklas Haas
dc61b74c1d avfilter/scene_sad: pass true depth to ff_scene_sad_get_fn()
I need to be able to distinguish between 10/12/14 and 16 bit depths, for
overflow reasons.
2025-07-17 12:26:05 +02:00
Marton Balint
b24155cae1 avfilter/avfilter: add AVFilterGraph->max_buffered_frames to limit buffered frames
Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-14 22:05:10 +02:00
Marton Balint
71468e85ae avfilter/framequeue: add support for limiting and tracking buffered frames in the queues
Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-14 22:03:36 +02:00
Niklas Haas
0a5ae743ef avfilter/vf_thumbnail: switch to query_func2
Instead of enumerating a static list of planar formats to support, walk
through the format list and enable all supported formats.

As of writing, this generates the following format list:
- gbrap
- gbrap10le
- gbrap12le
- gbrap14le
- gbrap16le
- gbrp
- gbrp10le
- gbrp12le
- gbrp14le
- gbrp16le
- gbrp9le
- gray
- gray10le
- gray12le
- gray14le
- gray16le
- gray9le
- ya16le
- ya8
- yuv410p
- yuv411p
- yuv420p
- yuv420p10le
- yuv420p12le
- yuv420p14le
- yuv420p16le
- yuv420p9le
- yuv422p
- yuv422p10le
- yuv422p12le
- yuv422p14le
- yuv422p16le
- yuv422p9le
- yuv440p
- yuv440p10le
- yuv440p12le
- yuv444p
- yuv444p10le
- yuv444p12le
- yuv444p14le
- yuv444p16le
- yuv444p9le
- yuva420p
- yuva420p10le
- yuva420p16le
- yuva420p9le
- yuva422p
- yuva422p10le
- yuva422p12le
- yuva422p16le
- yuva422p9le
- yuva444p
- yuva444p10le
- yuva444p12le
- yuva444p16le
- yuva444p9le
- yuvj411p
- yuvj420p
- yuvj422p
- yuvj440p
- yuvj444p
2025-07-12 12:52:33 +02:00
Niklas Haas
cf18b280f0 avfilter/vf_thumbnail: support more planar formats
This adds support for high bit depth formats, as well as formats with fewer
than 3 planes. The implementation for HBD is the same as for 8 bit formats,
just right shifted to 8 bits.

It's worth pointing out that this also works for HDR formats (and even DV),
because the underlying implementation is just trying to minimize the histogram
difference. If anything, using a HDR format will result in a *more* accurate
detection, because HDR formats tend to be more perceptually uniform.
2025-07-12 12:52:33 +02:00
Jorge Estrada
cd91469114 avfilter/overlay_cuda: add timeline editing support
Enables timeline editing options for overlay_cuda similar to what overlay allows

Example overlaying an image on a video between 30 to 60 seconds:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i sample-video.mp4 -i sample-image.jpg
-filter_complex "[1:v]hwupload_cuda[image],[0:v]scale_npp=format=yuv420p[video],[video][image]overlay_cuda=enable='between(t,30,60)'"
-c:v h264_nvenc -c:a copy -y overlay-output-gpu.mp4

Signed-off-by: Jorge Estrada <jestrada.list@gmail.com>
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2025-07-11 17:49:58 +02:00
Lidong Yan
a4a71b5e9d avfilter/asrc_sinc: fix leak in config_input()
In config_input(), fir_to_phase() allocates memory in h[longer], which
would leak if av_calloc() to s->coeffs failed. lpf() allocates memory
in h[0] and h[1], which would leak if fir_to_phase() failed. To fix
this leak, add av_free(h[longer]) in as cleanup code, and replace
return AVERROR* with goto cleanup to prevent from leaks.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-07-05 21:10:36 +02:00
Jorge Estrada
ad0a44028d avfilter: add pad_cuda filter
This patch adds the pad_cuda video filter. A filter similar to the existing pad filter but accelerated by CUDA.

The filter shares the same options as the software pad filter.

Example usage:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "pad_cuda=w=iw+100:h=ih+100:x=-1:y=-1:color=red" out.mp4

Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2025-07-04 01:32:27 +02:00
Marton Balint
af189e424b avfilter/f_select: port to activate
Multi-input or multi-output filters should use activate now.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:54 +02:00
Marton Balint
223c2b03da avfilter/buffersink: keep requesting frames if one activation of the graph does not provide one
A frame graph activation might not produce a frame in the requested sink, so
keep on requesting a frame there unless we encounter a filter activation with
buffersrc empty error.

This makes av_buffersink_get_frame(_flags) work according to its documentation
which claims that EAGAIN is only returned if additional frames must be inserted
into the graph.

Fate changes are because audio frames will have different sizes at segment
boundaries, but content is the same.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:54 +02:00
Marton Balint
d41bac1333 avfilter: signal an empty buffersrc with an explicit activate error code
No change in functionality.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:54 +02:00
Marton Balint
44546751db avfilter/avfilter: make filter_activate_default request frames on behalf of sinks
Sinks without an activate callback have no means to request frames in their
input, therefore the default activate callback should do it for them.

Fixes ticket #11624.
Fixes ticket #10988.
Fixes ticket #10990.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:54 +02:00
Marton Balint
4440e499ba avfilter/avfilter: always forward request frame in filter_activate_default
Even if all inputs are blocked an activate callback should request a frame on
some if its inputs if a frame is requested on any of its outputs.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:54 +02:00
Marton Balint
a736ac72bb avfilter/avfilter: fix forwarding EOF for simple API filters in filter_activate_default
EOF only need to be forwarded back if all outputs have reached EOF.

Fixes infinte loop with ffprobe -f lavfi -i "smptebars=d=1,select=n=2:e=1[out0][out1]"
Regression since d9e41ead82.

Fixes ticket #10959.
Fixes ticket #11366.

Signed-off-by: Marton Balint <cus@passwd.hu>
2025-07-03 21:41:53 +02:00