ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-01-06 14:15:29 +01:00

Author	SHA1	Message	Date
Andreas Rheinhardt	4fc05c28f4	avfilter/x86/vf_gradfun: Remove MMXEXT func overridden by SSSE3 SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD), so that the overwhelming majority of our users (particularly those that actually update their FFmpeg) will be using the SSSE3 version of filter_line. This commit therefore removes the overridden MMXEXT version (which didn't abide by the ABI) which allows us to remove an emms_c() from vf_gradfun.c, so that users with SSSE3 no longer pay a price for the mere existence of an MMXEXT version. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-26 06:21:35 +02:00
Niklas Haas	843920d5d6	avfilter/x86/vf_idetdsp: add AVX2 and AVX512 implementations The only thing that changes slightly is the horizontal sum at the end.	2025-09-21 11:02:41 +00:00
Niklas Haas	4c067d0778	avfilter/x86/vf_idetdsp: generalize 8-bit macro This is mostly compatible with AVX as well, so turn it into a macro.	2025-09-21 11:02:41 +00:00
Niklas Haas	326abf359f	avfilter/vf_idetdsp: use consistent uint8_t pointer type Even for 16-bit DSP functions. Instead, cast the pointer inside the function.	2025-09-21 11:02:41 +00:00
Niklas Haas	60dbcc5321	avfilter/vf_idetdsp: pass actual bit depth More informative and IMO cleaner; some implementations may want to differentiate by exact bit depth or support 32 bit down the line.	2025-09-21 11:02:41 +00:00
Niklas Haas	5830743363	avfilter/vf_idet: separate DSP parts To avoid pulling in the entire libavfilter when using the DSP functions from checkasm. The rest of the struct is not needed outside vf_idet.c and was moved there.	2025-09-21 11:02:41 +00:00
Andreas Rheinhardt	a35c91dc14	avfilter/vf_colordetect: Rename header to vf_colordetectdsp.h It is more in line with our naming conventions. Reviewed-by: Martin Storsjö <martin@martin.st> Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-16 18:22:24 +02:00
Niklas Haas	ba8aa0e7b3	avfilter/x86/vf_overlay: simplify function signature No reason to pass all the variables again, if we're already passing the context.	2025-09-02 17:06:25 +02:00
Niklas Haas	6d6bbdaab0	avfilter/vf_overlay: rename variables for clarity `is_straight`, `alpha_mode` etc. are more consistently named to refer to either the main image, or the overlay.	2025-09-02 17:06:25 +02:00
Niklas Haas	6f3eddbedd	avfilter/vf_overlay: configure alpha mode on the link And use the link-tagged value instead of the hard-coded parameter.	2025-09-02 17:06:25 +02:00
Niklas Haas	f07c12d806	avfilter/x86/vf_colordetect: fix alpha detect tail handling This wrapping logic still considered any nonzero return from the ASM function to be the overall result, but this is not true since the addition of FF_ALPHA_TRANSPARENT. Fix it by only early returning if FF_ALPHA_STRAIGHT is detected. Fixes: `9b8b78a815` See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20301#issuecomment-4802	2025-09-01 15:33:43 +00:00
Niklas Haas	9b8b78a815	avfilter/vf_colordetect: detect fully opaque alpha planes It can be useful to know if the alpha plane consists of fully opaque pixels or not, in which case it can e.g. safely be stripped. This only requires a very minor modification to the AVX2 routines, adding an extra AND on the read alpha value with the reference alpha value, and a single extra cheap test per line. detect_alpha_8_full_c: 2849.1 ( 1.00x) detect_alpha_8_full_avx2: 260.3 (10.95x) detect_alpha_8_full_avx512icl: 130.2 (21.87x) detect_alpha_8_limited_c: 8349.2 ( 1.00x) detect_alpha_8_limited_avx2: 756.6 (11.04x) detect_alpha_8_limited_avx512icl: 364.2 (22.93x) detect_alpha_16_full_c: 1652.8 ( 1.00x) detect_alpha_16_full_avx2: 236.5 ( 6.99x) detect_alpha_16_full_avx512icl: 134.6 (12.28x) detect_alpha_16_limited_c: 5263.1 ( 1.00x) detect_alpha_16_limited_avx2: 797.4 ( 6.60x) detect_alpha_16_limited_avx512icl: 400.3 (13.15x)	2025-08-18 18:50:00 +00:00
Niklas Haas	c96ccd78fc	avfilter/vf_colordetect: rename p, q, k variables for clarity Purely cosmetic. Motivated in part because I want to depend on the assumption that P represents the maximum alpha channel value.	2025-08-18 18:50:00 +00:00
James Almer	3f58c9df14	avfilter/x86/vf_bwdif: use the correct preprocessor check Signed-off-by: James Almer <jamrial@gmail.com>	2025-08-03 19:26:18 -03:00
Niklas Haas	7f00e24d70	vf_bwdif: add AVX512 implementation I also tried replacing some of the instructions by more elaborate ones using masks, but I found no performance gain significant enough to be worth maintaining two code paths, so this implementation merely replaces the AVX2 implementation by drop-in AVX512 equivalents. bwdif8_c: 6362.2 ( 1.00x) bwdif8_sse2: 1004.9 ( 6.33x) bwdif8_ssse3: 946.0 ( 6.73x) bwdif8_avx2: 477.9 (13.31x) bwdif8_avx512: 273.3 (23.28x) bwdif10_c: 6341.5 ( 1.00x) bwdif10_sse2: 872.4 ( 7.27x) bwdif10_ssse3: 803.4 ( 7.89x) bwdif10_avx2: 416.7 (15.22x) bwdif10_avx512: 224.3 (28.27x) Realtime test at 3840x2160 yuv420p: avx2: frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93 avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93 The use of this function is gated behind avx512icl so that it doesn't downclock on Skylake.	2025-08-03 22:13:51 +00:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
James Almer	a01dc3aa27	avfilter/x86/vf_colordetect: add missing preprocessor checks Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 18:03:22 -03:00
James Almer	c62813a057	avfilter/x86/vf_colordetect: make the AVX512 functions run only on ICL targets or newer For detect_range, the usage of vpbroadcast{b,w} requires the AVX512BW extension, and for detect_alpha we don't want ZMM instructions downclocking old CPUs. Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 17:25:28 -03:00
James Almer	70fc4e5909	avfilter/x86/vf_colordetect_init: don't enable ASM functions on targets where it's known they will be slower Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:58:51 -03:00
James Almer	fdca209f1f	avfilter/x86/vf_colordetect: don't use rax to return a 32bit integer Fixes compilation on x86_32 targets Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:58:36 -03:00
James Almer	14f4478354	avfilter/x86/vf_colordetect: fix use of AVX512 instruction in AVX2 function on non Unix64 targets Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:52:46 -03:00
Niklas Haas	8b647b3f8a	avfilter/vf_colordetect: add x86 SIMD implementation alphadetect8_full_c: 5658.2 ( 1.00x) alphadetect8_full_avx2: 215.1 (26.31x) alphadetect8_full_avx512: 133.5 (42.40x) alphadetect8_limited_c: 7391.5 ( 1.00x) alphadetect8_limited_avx2: 649.3 (11.38x) alphadetect8_limited_avx512: 330.5 (22.36x) alphadetect16_full_c: 3027.4 ( 1.00x) alphadetect16_full_avx2: 209.4 (14.46x) alphadetect16_full_avx512: 141.4 (21.41x) alphadetect16_limited_c: 3880.9 ( 1.00x) alphadetect16_limited_avx2: 734.9 ( 5.28x) alphadetect16_limited_avx512: 349.2 (11.11x) rangedetect8_c: 5854.2 ( 1.00x) rangedetect8_avx2: 138.9 (42.15x) rangedetect8_avx512: 106.2 (55.12x) rangedetect16_c: 4122.0 ( 1.00x) rangedetect16_avx2: 138.6 (29.74x) rangedetect16_avx512: 104.1 (39.60x)	2025-07-21 18:10:25 +02:00
James Almer	85f2911891	avfilter/x86/vf_blackdetect: add missing preprocessor check Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 15:17:02 -03:00
James Almer	ee4ff3f706	avfilter/x86/vf_blackdetect_init: don't enable the ASM functions on targets where it's known they will be slower Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 13:05:44 -03:00
James Almer	f263192f0e	avfilter/x86/vf_blackdetect: don't use rax to return a 32bit integer Fixes compilation on x86_32. Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 13:05:44 -03:00
Niklas Haas	75cd42c48a	avfilter/vf_blackdetect: add AVX2 SIMD version Requested by a user. Even with autovectorization enabled, the compiler performs a quite poor job of optimizing this function, due to not being able to take advantage of the pmaxub + pcmpeqb trick for counting the number of pixels less than or equal-to a threshold. blackdetect8_c: 4625.0 ( 1.00x) blackdetect8_avx2: 155.1 (29.83x) blackdetect16_c: 2529.4 ( 1.00x) blackdetect16_avx2: 163.6 (15.46x)	2025-07-18 10:47:31 +02:00
Niklas Haas	e44a1aaeec	avfilter/x86/scene_sad: add high bit depth AVX2/AVX512 version Since psadbw only exists for 8-bits, we have to emulate it for 16-bit inputs. The simplest sequence is to use a normal subtraction, which is safe as long as the inputs do not exceed 32767 - so limit this implementation to 15-bit inputs and below. For 16-bit inputs, we could in theory instead use a pminw / pmaxw to ensure the resulting difference does not overflow, but this is slower, and also breaks the subsequent use of pmaddwd, so I opted to skip 16-bit SIMD for now. scene_sad10_c: 114175.6 ( 1.00x) scene_sad10_avx2: 9617.7 (11.87x) scene_sad10_avx512: 5208.8 (21.92x) scene_sad12_c: 114537.8 ( 1.00x) scene_sad12_avx2: 9614.0 (11.91x) scene_sad12_avx512: 5186.3 (22.08x) scene_sad14_c: 114113.9 ( 1.00x) scene_sad14_avx2: 9612.9 (11.87x) scene_sad14_avx512: 5186.0 (22.00x) scene_sad15_c: 114108.9 ( 1.00x) scene_sad15_avx2: 9612.3 (11.87x) scene_sad15_avx512: 5186.4 (22.00x) scene_sad16_c: 114136.0 ( 1.00x)	2025-07-17 12:26:06 +02:00
Niklas Haas	91f2d146d4	avfilter/x86/scene_sad: add AVX512 implementation Trivial to add, but a lot faster (on my machine). scene_sad8_c: 114476.4 ( 1.00x) scene_sad8_sse2: 8644.3 (13.24x) scene_sad8_avx2: 4520.1 (25.33x) scene_sad8_avx512: 3153.0 (36.31x)	2025-07-17 12:26:06 +02:00
Niklas Haas	dc61b74c1d	avfilter/scene_sad: pass true depth to ff_scene_sad_get_fn() I need to be able to distinguish between 10/12/14 and 16 bit depths, for overflow reasons.	2025-07-17 12:26:05 +02:00
James Almer	dbe94e1110	avfilter/x86/f_ebur128: replace AVX2 instruction with AVX equivalent Using vpbroadcastq in an AVX function will result in SIGILL errors on pre Haswell/Zen processors. Signed-off-by: James Almer <jamrial@gmail.com>	2025-06-22 09:31:44 -03:00
Niklas Haas	daef348574	avfilter/x86/f_ebur128: implement AVX peak calculation Stereo only, for simplicity. Slightly faster than the C code.	2025-06-21 17:28:39 +02:00
Niklas Haas	53e03ec8af	avfilter/x86/f_ebur128: add x86 AVX implementation Processes two channels in parallel, using 128-bit XMM registers. In theory, we could go up to YMM registers to process 4 channels, but this is not a gain except for relatively high channel counts (e.g. 7.1), and also complicates the sample load/store operations considerably. I decided to only add an AVX variant, since the C code is not substantially slower enough to justify a separate function just for ancient CPUs.	2025-06-21 17:21:36 +02:00
Andreas Rheinhardt	0435cd5a62	avfilter/x86/vf_spp: Remove permutation-specific code The MMX requantize functions have the MMX permutation (i.e. FF_IDCT_PERM_SIMPLE) hardcoded and therefore check for the used permutation (namely via a CRC). Yet this is very ugly and could even lead to misdetection; furthermore, since `d7246ea9f2` the permutation used here is de-facto and since `bfb28b5ce8` definitely impossible on x64, making this code dead on x64. So remove it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-05-31 01:31:09 +02:00
James Almer	362586fcad	avfilter/vf_xpsnr: remove duplicated DSP infranstructure Fully reuse the existing one from vf_psnr, instead of halfways. Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-07 09:33:52 -03:00
Christian Helmrich	865cd3c056	avfilter: add XPSNR filter Add XPSNR video filter Register new filter xpsnr. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-08 17:51:37 +02:00
Marton Balint	a69a0b689c	avfilter/blend: put slice parameters to a single struct This should make future extensibility easier. Signed-off-by: Marton Balint <cus@passwd.hu>	2024-05-14 21:07:37 +02:00
Andreas Rheinhardt	9ec928e627	avfilter/x86/Makefile: Fix standalone build of haldclut filter Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-05-07 23:53:26 +02:00
Andreas Rheinhardt	c11d7ca2f0	avfilter/x86/Makefile: Add missing dependencies for sobel filter Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-05-07 23:53:26 +02:00
Andreas Rheinhardt	790f793844	avutil/common: Don't auto-include mem.h There are lots of files that don't need it: The number of object files that actually need it went down from 2011 to 884 here. Keep it for external users in order to not cause breakages. Also improve the other headers a bit while just at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:43 +01:00
Henrik Gramner	782c4df28d	x86: Avoid using 'd' as an argument name x86inc.asm adds defines for <argument_name>{b,w,d,q} which clashes with the nasm d{b,w,d,q} pseudo-instructions for writing initialized data.	2024-03-24 14:53:57 +01:00
Andreas Rheinhardt	fa06f48371	avfilter/bwdifdsp: Constify Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Andreas Rheinhardt	80afcc8539	avfilter/bwdif: Add proper BWDIFDSPContext This already avoids unnecessary indirectly included headers in the arch-specific vf_bwdif_init.c files; it is also in preparation for splitting the actual functions out of vf_bwdif.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Paul B Mahol	c5effe7d3d	avfilter/x86/af_afir: add FMA3 SIMD	2023-09-17 11:11:24 +02:00
Evgeny Pavlov	cb1479faca	avfilter/vf_ssim: Fix x86 assembly code for SSIM calculation This commit fixes bug #10495 The code had several bugs related to post-loop compensation code: - test assembly instruction performs bitwise AND operation and generate flags used by jz branch instruction. Wrong test condition leads to incorrect branching - Incorrect compensation code for some branches Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>	2023-08-21 17:04:51 +02:00
James Almer	aca8ceb870	x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known to be fast with it Signed-off-by: James Almer <jamrial@gmail.com>	2023-03-25 13:27:20 -03:00
James Darnley	073ec3b9da	avfilter/bwdif: add avx2 filter_line function 8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3	2023-03-25 02:38:17 +01:00
James Darnley	b503b5a0cf	avfilter/bwdif: move filter_line init to a dedicated function	2023-03-25 02:38:17 +01:00
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
Wang, Bin	459527108a	libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64 Reviewed by: James Almer <jamrial@gmail.com> Signed-off-by: Wang, Bin <bin.wang@intel.com>	2022-11-21 12:28:25 +08:00
bwang30	3ab11dc5bb	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-11-14 10:04:16 +08:00

1 2 3 4 5 ...

370 Commits