Commit Graph

359 Commits

Author SHA1 Message Date
Niklas Haas
9b8b78a815 avfilter/vf_colordetect: detect fully opaque alpha planes
It can be useful to know if the alpha plane consists of fully opaque
pixels or not, in which case it can e.g. safely be stripped.

This only requires a very minor modification to the AVX2 routines, adding
an extra AND on the read alpha value with the reference alpha value, and a
single extra cheap test per line.

detect_alpha_8_full_c:                                2849.1 ( 1.00x)
detect_alpha_8_full_avx2:                              260.3 (10.95x)
detect_alpha_8_full_avx512icl:                         130.2 (21.87x)
detect_alpha_8_limited_c:                             8349.2 ( 1.00x)
detect_alpha_8_limited_avx2:                           756.6 (11.04x)
detect_alpha_8_limited_avx512icl:                      364.2 (22.93x)
detect_alpha_16_full_c:                               1652.8 ( 1.00x)
detect_alpha_16_full_avx2:                             236.5 ( 6.99x)
detect_alpha_16_full_avx512icl:                        134.6 (12.28x)
detect_alpha_16_limited_c:                            5263.1 ( 1.00x)
detect_alpha_16_limited_avx2:                          797.4 ( 6.60x)
detect_alpha_16_limited_avx512icl:                     400.3 (13.15x)
2025-08-18 18:50:00 +00:00
Niklas Haas
c96ccd78fc avfilter/vf_colordetect: rename p, q, k variables for clarity
Purely cosmetic.

Motivated in part because I want to depend on the assumption that P
represents the maximum alpha channel value.
2025-08-18 18:50:00 +00:00
James Almer
3f58c9df14 avfilter/x86/vf_bwdif: use the correct preprocessor check
Signed-off-by: James Almer <jamrial@gmail.com>
2025-08-03 19:26:18 -03:00
Niklas Haas
7f00e24d70 vf_bwdif: add AVX512 implementation
I also tried replacing some of the instructions by more elaborate ones
using masks, but I found no performance gain significant enough to be worth
maintaining two code paths, so this implementation merely replaces the AVX2
implementation by drop-in AVX512 equivalents.

bwdif8_c:                                             6362.2 ( 1.00x)
bwdif8_sse2:                                          1004.9 ( 6.33x)
bwdif8_ssse3:                                          946.0 ( 6.73x)
bwdif8_avx2:                                           477.9 (13.31x)
bwdif8_avx512:                                         273.3 (23.28x)

bwdif10_c:                                            6341.5 ( 1.00x)
bwdif10_sse2:                                          872.4 ( 7.27x)
bwdif10_ssse3:                                         803.4 ( 7.89x)
bwdif10_avx2:                                          416.7 (15.22x)
bwdif10_avx512:                                        224.3 (28.27x)

Realtime test at 3840x2160 yuv420p:

avx2:   frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93
avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93

The use of this function is gated behind avx512icl so that it doesn't
downclock on Skylake.
2025-08-03 22:13:51 +00:00
Timo Rothenpieler
262d41c804 all: fix typos found by codespell 2025-08-03 13:48:47 +02:00
James Almer
a01dc3aa27 avfilter/x86/vf_colordetect: add missing preprocessor checks
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 18:03:22 -03:00
James Almer
c62813a057 avfilter/x86/vf_colordetect: make the AVX512 functions run only on ICL targets or newer
For detect_range, the usage of vpbroadcast{b,w} requires the AVX512BW extension, and for
detect_alpha we don't want ZMM instructions downclocking old CPUs.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 17:25:28 -03:00
James Almer
70fc4e5909 avfilter/x86/vf_colordetect_init: don't enable ASM functions on targets where it's known they will be slower
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:58:51 -03:00
James Almer
fdca209f1f avfilter/x86/vf_colordetect: don't use rax to return a 32bit integer
Fixes compilation on x86_32 targets

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:58:36 -03:00
James Almer
14f4478354 avfilter/x86/vf_colordetect: fix use of AVX512 instruction in AVX2 function on non Unix64 targets
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-21 16:52:46 -03:00
Niklas Haas
8b647b3f8a avfilter/vf_colordetect: add x86 SIMD implementation
alphadetect8_full_c:                                  5658.2 ( 1.00x)
alphadetect8_full_avx2:                                215.1 (26.31x)
alphadetect8_full_avx512:                              133.5 (42.40x)
alphadetect8_limited_c:                               7391.5 ( 1.00x)
alphadetect8_limited_avx2:                             649.3 (11.38x)
alphadetect8_limited_avx512:                           330.5 (22.36x)
alphadetect16_full_c:                                 3027.4 ( 1.00x)
alphadetect16_full_avx2:                               209.4 (14.46x)
alphadetect16_full_avx512:                             141.4 (21.41x)
alphadetect16_limited_c:                              3880.9 ( 1.00x)
alphadetect16_limited_avx2:                            734.9 ( 5.28x)
alphadetect16_limited_avx512:                          349.2 (11.11x)
rangedetect8_c:                                       5854.2 ( 1.00x)
rangedetect8_avx2:                                     138.9 (42.15x)
rangedetect8_avx512:                                   106.2 (55.12x)
rangedetect16_c:                                      4122.0 ( 1.00x)
rangedetect16_avx2:                                    138.6 (29.74x)
rangedetect16_avx512:                                  104.1 (39.60x)
2025-07-21 18:10:25 +02:00
James Almer
85f2911891 avfilter/x86/vf_blackdetect: add missing preprocessor check
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 15:17:02 -03:00
James Almer
ee4ff3f706 avfilter/x86/vf_blackdetect_init: don't enable the ASM functions on targets where it's known they will be slower
Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 13:05:44 -03:00
James Almer
f263192f0e avfilter/x86/vf_blackdetect: don't use rax to return a 32bit integer
Fixes compilation on x86_32.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-07-18 13:05:44 -03:00
Niklas Haas
75cd42c48a avfilter/vf_blackdetect: add AVX2 SIMD version
Requested by a user. Even with autovectorization enabled, the compiler
performs a quite poor job of optimizing this function, due to not being
able to take advantage of the pmaxub + pcmpeqb trick for counting the number
of pixels less than or equal-to a threshold.

blackdetect8_c:                                       4625.0 ( 1.00x)
blackdetect8_avx2:                                     155.1 (29.83x)
blackdetect16_c:                                      2529.4 ( 1.00x)
blackdetect16_avx2:                                    163.6 (15.46x)
2025-07-18 10:47:31 +02:00
Niklas Haas
e44a1aaeec avfilter/x86/scene_sad: add high bit depth AVX2/AVX512 version
Since psadbw only exists for 8-bits, we have to emulate it for 16-bit
inputs. The simplest sequence is to use a normal subtraction, which is safe
as long as the inputs do not exceed 32767 - so limit this implementation
to 15-bit inputs and below.

For 16-bit inputs, we could in theory instead use a pminw / pmaxw to ensure
the resulting difference does not overflow, but this is slower, and also
breaks the subsequent use of pmaddwd, so I opted to skip 16-bit SIMD for
now.

scene_sad10_c:                                      114175.6 ( 1.00x)
scene_sad10_avx2:                                     9617.7 (11.87x)
scene_sad10_avx512:                                   5208.8 (21.92x)
scene_sad12_c:                                      114537.8 ( 1.00x)
scene_sad12_avx2:                                     9614.0 (11.91x)
scene_sad12_avx512:                                   5186.3 (22.08x)
scene_sad14_c:                                      114113.9 ( 1.00x)
scene_sad14_avx2:                                     9612.9 (11.87x)
scene_sad14_avx512:                                   5186.0 (22.00x)
scene_sad15_c:                                      114108.9 ( 1.00x)
scene_sad15_avx2:                                     9612.3 (11.87x)
scene_sad15_avx512:                                   5186.4 (22.00x)
scene_sad16_c:                                      114136.0 ( 1.00x)
2025-07-17 12:26:06 +02:00
Niklas Haas
91f2d146d4 avfilter/x86/scene_sad: add AVX512 implementation
Trivial to add, but a lot faster (on my machine).

scene_sad8_c:                                       114476.4 ( 1.00x)
scene_sad8_sse2:                                      8644.3 (13.24x)
scene_sad8_avx2:                                      4520.1 (25.33x)
scene_sad8_avx512:                                    3153.0 (36.31x)
2025-07-17 12:26:06 +02:00
Niklas Haas
dc61b74c1d avfilter/scene_sad: pass true depth to ff_scene_sad_get_fn()
I need to be able to distinguish between 10/12/14 and 16 bit depths, for
overflow reasons.
2025-07-17 12:26:05 +02:00
James Almer
dbe94e1110 avfilter/x86/f_ebur128: replace AVX2 instruction with AVX equivalent
Using vpbroadcastq in an AVX function will result in SIGILL errors on pre
Haswell/Zen processors.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-06-22 09:31:44 -03:00
Niklas Haas
daef348574 avfilter/x86/f_ebur128: implement AVX peak calculation
Stereo only, for simplicity. Slightly faster than the C code.
2025-06-21 17:28:39 +02:00
Niklas Haas
53e03ec8af avfilter/x86/f_ebur128: add x86 AVX implementation
Processes two channels in parallel, using 128-bit XMM registers.

In theory, we could go up to YMM registers to process 4 channels, but this is
not a gain except for relatively high channel counts (e.g. 7.1), and also
complicates the sample load/store operations considerably.

I decided to only add an AVX variant, since the C code is not substantially
slower enough to justify a separate function just for ancient CPUs.
2025-06-21 17:21:36 +02:00
Andreas Rheinhardt
0435cd5a62 avfilter/x86/vf_spp: Remove permutation-specific code
The MMX requantize functions have the MMX permutation
(i.e. FF_IDCT_PERM_SIMPLE) hardcoded and therefore
check for the used permutation (namely via a CRC).
Yet this is very ugly and could even lead to misdetection;
furthermore, since d7246ea9f2
the permutation used here is de-facto and since
bfb28b5ce8 definitely
impossible on x64, making this code dead on x64.
So remove it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-05-31 01:31:09 +02:00
James Almer
362586fcad avfilter/vf_xpsnr: remove duplicated DSP infranstructure
Fully reuse the existing one from vf_psnr, instead of halfways.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-07 09:33:52 -03:00
Christian Helmrich
865cd3c056 avfilter: add XPSNR filter
Add XPSNR video filter
Register new filter xpsnr.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-08 17:51:37 +02:00
Marton Balint
a69a0b689c avfilter/blend: put slice parameters to a single struct
This should make future extensibility easier.

Signed-off-by: Marton Balint <cus@passwd.hu>
2024-05-14 21:07:37 +02:00
Andreas Rheinhardt
9ec928e627 avfilter/x86/Makefile: Fix standalone build of haldclut filter
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-07 23:53:26 +02:00
Andreas Rheinhardt
c11d7ca2f0 avfilter/x86/Makefile: Add missing dependencies for sobel filter
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-07 23:53:26 +02:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Henrik Gramner
782c4df28d x86: Avoid using 'd' as an argument name
x86inc.asm adds defines for <argument_name>{b,w,d,q} which clashes with
the nasm d{b,w,d,q} pseudo-instructions for writing initialized data.
2024-03-24 14:53:57 +01:00
Andreas Rheinhardt
fa06f48371 avfilter/bwdifdsp: Constify
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-28 00:17:47 +02:00
Andreas Rheinhardt
80afcc8539 avfilter/bwdif: Add proper BWDIFDSPContext
This already avoids unnecessary indirectly included headers
in the arch-specific vf_bwdif_init.c files; it is also in
preparation for splitting the actual functions out of vf_bwdif.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-28 00:17:47 +02:00
Paul B Mahol
c5effe7d3d avfilter/x86/af_afir: add FMA3 SIMD 2023-09-17 11:11:24 +02:00
Evgeny Pavlov
cb1479faca avfilter/vf_ssim: Fix x86 assembly code for SSIM calculation
This commit fixes bug #10495

The code had several bugs related to post-loop compensation code:
- test assembly instruction performs bitwise AND operation and
generate flags used by jz branch instruction. Wrong test condition
leads to incorrect branching
- Incorrect compensation code for some branches

Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>
2023-08-21 17:04:51 +02:00
James Almer
aca8ceb870 x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known to be fast with it
Signed-off-by: James Almer <jamrial@gmail.com>
2023-03-25 13:27:20 -03:00
James Darnley
073ec3b9da avfilter/bwdif: add avx2 filter_line function
8-bit:
2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3
10-bit:
2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3
2023-03-25 02:38:17 +01:00
James Darnley
b503b5a0cf avfilter/bwdif: move filter_line init to a dedicated function 2023-03-25 02:38:17 +01:00
Lynne
bbe95f7353 x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Wang, Bin
459527108a libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64
Reviewed by: James Almer <jamrial@gmail.com>
Signed-off-by: Wang, Bin <bin.wang@intel.com>
2022-11-21 12:28:25 +08:00
bwang30
3ab11dc5bb libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter

sobel_c: 4537
sobel_avx512icl 2136

Signed-off-by: bwang30 <bin.wang@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-11-14 10:04:16 +08:00
Paul B Mahol
00b03331a0 avfilter/vf_threshold: fix handling of zero threshold 2022-10-27 10:23:24 +02:00
Andreas Rheinhardt
ed42a51930 avfilter/x86/vf_bwdif: Remove obsolete MMXEXT functions
The only system which benefit from these are truely ancient
32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:38:14 +02:00
Andreas Rheinhardt
7c3c1d938f avfilter/x86/vf_idet: Remove obsolete MMX(EXT) functions
The only system which benefit from these are truely ancient
32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:38:01 +02:00
Andreas Rheinhardt
4d7128be9a avfilter/x86/vf_yadif: Remove obsolete MMXEXT functions
The only system which benefit from these are truely ancient
32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:37:48 +02:00
Andreas Rheinhardt
77b2a422a0 avfilter/x86/vf_eq_init: Remove obsolete MMXEXT function
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from process_mmxext are truely ancient 32bit x86s
it is removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:36:31 +02:00
Andreas Rheinhardt
c5dd2fdc09 avfilter/x86/vf_noise: Remove obsolete MMX function
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from line_noise_mmx are truely ancient 32bit x86s
it is removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:32:08 +02:00
Andreas Rheinhardt
0df18f29ae avfilter/af_afir: Only keep DSP stuff in header
Only the AudioFIRDSPContext and the functions for its initialization
are needed outside of lavfi/af_afir.c.
Also rename the header to af_afirdsp.h to reflect the change.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-05-06 05:19:49 +02:00
Paul B Mahol
28d011516b avfilter/x86/vf_limiter: use movu, dst may not be always aligned
Happens with pad filter after limiter.
2022-03-24 09:44:09 +01:00
Marton Balint
5b3732227e avfilter/x86/vf_blend: use unaligned movs for output
Fixes crashes with:

ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x

Signed-off-by: Marton Balint <cus@passwd.hu>
2022-03-21 00:50:44 +01:00
Paul B Mahol
dae95b3ffd avfilter/vf_maskedmerge: fix rounding when masking 2022-03-03 09:57:53 +01:00
Paul B Mahol
047c362d3c avfilter/vf_nlmeans: add x86 SIMD 2021-11-11 21:54:46 +01:00