ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-01-09 15:45:27 +01:00

Author	SHA1	Message	Date
Niklas Haas	9b8b78a815	avfilter/vf_colordetect: detect fully opaque alpha planes It can be useful to know if the alpha plane consists of fully opaque pixels or not, in which case it can e.g. safely be stripped. This only requires a very minor modification to the AVX2 routines, adding an extra AND on the read alpha value with the reference alpha value, and a single extra cheap test per line. detect_alpha_8_full_c: 2849.1 ( 1.00x) detect_alpha_8_full_avx2: 260.3 (10.95x) detect_alpha_8_full_avx512icl: 130.2 (21.87x) detect_alpha_8_limited_c: 8349.2 ( 1.00x) detect_alpha_8_limited_avx2: 756.6 (11.04x) detect_alpha_8_limited_avx512icl: 364.2 (22.93x) detect_alpha_16_full_c: 1652.8 ( 1.00x) detect_alpha_16_full_avx2: 236.5 ( 6.99x) detect_alpha_16_full_avx512icl: 134.6 (12.28x) detect_alpha_16_limited_c: 5263.1 ( 1.00x) detect_alpha_16_limited_avx2: 797.4 ( 6.60x) detect_alpha_16_limited_avx512icl: 400.3 (13.15x)	2025-08-18 18:50:00 +00:00
Niklas Haas	c96ccd78fc	avfilter/vf_colordetect: rename p, q, k variables for clarity Purely cosmetic. Motivated in part because I want to depend on the assumption that P represents the maximum alpha channel value.	2025-08-18 18:50:00 +00:00
James Almer	3f58c9df14	avfilter/x86/vf_bwdif: use the correct preprocessor check Signed-off-by: James Almer <jamrial@gmail.com>	2025-08-03 19:26:18 -03:00
Niklas Haas	7f00e24d70	vf_bwdif: add AVX512 implementation I also tried replacing some of the instructions by more elaborate ones using masks, but I found no performance gain significant enough to be worth maintaining two code paths, so this implementation merely replaces the AVX2 implementation by drop-in AVX512 equivalents. bwdif8_c: 6362.2 ( 1.00x) bwdif8_sse2: 1004.9 ( 6.33x) bwdif8_ssse3: 946.0 ( 6.73x) bwdif8_avx2: 477.9 (13.31x) bwdif8_avx512: 273.3 (23.28x) bwdif10_c: 6341.5 ( 1.00x) bwdif10_sse2: 872.4 ( 7.27x) bwdif10_ssse3: 803.4 ( 7.89x) bwdif10_avx2: 416.7 (15.22x) bwdif10_avx512: 224.3 (28.27x) Realtime test at 3840x2160 yuv420p: avx2: frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93 avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93 The use of this function is gated behind avx512icl so that it doesn't downclock on Skylake.	2025-08-03 22:13:51 +00:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
James Almer	a01dc3aa27	avfilter/x86/vf_colordetect: add missing preprocessor checks Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 18:03:22 -03:00
James Almer	c62813a057	avfilter/x86/vf_colordetect: make the AVX512 functions run only on ICL targets or newer For detect_range, the usage of vpbroadcast{b,w} requires the AVX512BW extension, and for detect_alpha we don't want ZMM instructions downclocking old CPUs. Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 17:25:28 -03:00
James Almer	70fc4e5909	avfilter/x86/vf_colordetect_init: don't enable ASM functions on targets where it's known they will be slower Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:58:51 -03:00
James Almer	fdca209f1f	avfilter/x86/vf_colordetect: don't use rax to return a 32bit integer Fixes compilation on x86_32 targets Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:58:36 -03:00
James Almer	14f4478354	avfilter/x86/vf_colordetect: fix use of AVX512 instruction in AVX2 function on non Unix64 targets Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-21 16:52:46 -03:00
Niklas Haas	8b647b3f8a	avfilter/vf_colordetect: add x86 SIMD implementation alphadetect8_full_c: 5658.2 ( 1.00x) alphadetect8_full_avx2: 215.1 (26.31x) alphadetect8_full_avx512: 133.5 (42.40x) alphadetect8_limited_c: 7391.5 ( 1.00x) alphadetect8_limited_avx2: 649.3 (11.38x) alphadetect8_limited_avx512: 330.5 (22.36x) alphadetect16_full_c: 3027.4 ( 1.00x) alphadetect16_full_avx2: 209.4 (14.46x) alphadetect16_full_avx512: 141.4 (21.41x) alphadetect16_limited_c: 3880.9 ( 1.00x) alphadetect16_limited_avx2: 734.9 ( 5.28x) alphadetect16_limited_avx512: 349.2 (11.11x) rangedetect8_c: 5854.2 ( 1.00x) rangedetect8_avx2: 138.9 (42.15x) rangedetect8_avx512: 106.2 (55.12x) rangedetect16_c: 4122.0 ( 1.00x) rangedetect16_avx2: 138.6 (29.74x) rangedetect16_avx512: 104.1 (39.60x)	2025-07-21 18:10:25 +02:00
James Almer	85f2911891	avfilter/x86/vf_blackdetect: add missing preprocessor check Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 15:17:02 -03:00
James Almer	ee4ff3f706	avfilter/x86/vf_blackdetect_init: don't enable the ASM functions on targets where it's known they will be slower Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 13:05:44 -03:00
James Almer	f263192f0e	avfilter/x86/vf_blackdetect: don't use rax to return a 32bit integer Fixes compilation on x86_32. Signed-off-by: James Almer <jamrial@gmail.com>	2025-07-18 13:05:44 -03:00
Niklas Haas	75cd42c48a	avfilter/vf_blackdetect: add AVX2 SIMD version Requested by a user. Even with autovectorization enabled, the compiler performs a quite poor job of optimizing this function, due to not being able to take advantage of the pmaxub + pcmpeqb trick for counting the number of pixels less than or equal-to a threshold. blackdetect8_c: 4625.0 ( 1.00x) blackdetect8_avx2: 155.1 (29.83x) blackdetect16_c: 2529.4 ( 1.00x) blackdetect16_avx2: 163.6 (15.46x)	2025-07-18 10:47:31 +02:00
Niklas Haas	e44a1aaeec	avfilter/x86/scene_sad: add high bit depth AVX2/AVX512 version Since psadbw only exists for 8-bits, we have to emulate it for 16-bit inputs. The simplest sequence is to use a normal subtraction, which is safe as long as the inputs do not exceed 32767 - so limit this implementation to 15-bit inputs and below. For 16-bit inputs, we could in theory instead use a pminw / pmaxw to ensure the resulting difference does not overflow, but this is slower, and also breaks the subsequent use of pmaddwd, so I opted to skip 16-bit SIMD for now. scene_sad10_c: 114175.6 ( 1.00x) scene_sad10_avx2: 9617.7 (11.87x) scene_sad10_avx512: 5208.8 (21.92x) scene_sad12_c: 114537.8 ( 1.00x) scene_sad12_avx2: 9614.0 (11.91x) scene_sad12_avx512: 5186.3 (22.08x) scene_sad14_c: 114113.9 ( 1.00x) scene_sad14_avx2: 9612.9 (11.87x) scene_sad14_avx512: 5186.0 (22.00x) scene_sad15_c: 114108.9 ( 1.00x) scene_sad15_avx2: 9612.3 (11.87x) scene_sad15_avx512: 5186.4 (22.00x) scene_sad16_c: 114136.0 ( 1.00x)	2025-07-17 12:26:06 +02:00
Niklas Haas	91f2d146d4	avfilter/x86/scene_sad: add AVX512 implementation Trivial to add, but a lot faster (on my machine). scene_sad8_c: 114476.4 ( 1.00x) scene_sad8_sse2: 8644.3 (13.24x) scene_sad8_avx2: 4520.1 (25.33x) scene_sad8_avx512: 3153.0 (36.31x)	2025-07-17 12:26:06 +02:00
Niklas Haas	dc61b74c1d	avfilter/scene_sad: pass true depth to ff_scene_sad_get_fn() I need to be able to distinguish between 10/12/14 and 16 bit depths, for overflow reasons.	2025-07-17 12:26:05 +02:00
James Almer	dbe94e1110	avfilter/x86/f_ebur128: replace AVX2 instruction with AVX equivalent Using vpbroadcastq in an AVX function will result in SIGILL errors on pre Haswell/Zen processors. Signed-off-by: James Almer <jamrial@gmail.com>	2025-06-22 09:31:44 -03:00
Niklas Haas	daef348574	avfilter/x86/f_ebur128: implement AVX peak calculation Stereo only, for simplicity. Slightly faster than the C code.	2025-06-21 17:28:39 +02:00
Niklas Haas	53e03ec8af	avfilter/x86/f_ebur128: add x86 AVX implementation Processes two channels in parallel, using 128-bit XMM registers. In theory, we could go up to YMM registers to process 4 channels, but this is not a gain except for relatively high channel counts (e.g. 7.1), and also complicates the sample load/store operations considerably. I decided to only add an AVX variant, since the C code is not substantially slower enough to justify a separate function just for ancient CPUs.	2025-06-21 17:21:36 +02:00
Andreas Rheinhardt	0435cd5a62	avfilter/x86/vf_spp: Remove permutation-specific code The MMX requantize functions have the MMX permutation (i.e. FF_IDCT_PERM_SIMPLE) hardcoded and therefore check for the used permutation (namely via a CRC). Yet this is very ugly and could even lead to misdetection; furthermore, since `d7246ea9f2` the permutation used here is de-facto and since `bfb28b5ce8` definitely impossible on x64, making this code dead on x64. So remove it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-05-31 01:31:09 +02:00
James Almer	362586fcad	avfilter/vf_xpsnr: remove duplicated DSP infranstructure Fully reuse the existing one from vf_psnr, instead of halfways. Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-07 09:33:52 -03:00
Christian Helmrich	865cd3c056	avfilter: add XPSNR filter Add XPSNR video filter Register new filter xpsnr. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-08 17:51:37 +02:00
Marton Balint	a69a0b689c	avfilter/blend: put slice parameters to a single struct This should make future extensibility easier. Signed-off-by: Marton Balint <cus@passwd.hu>	2024-05-14 21:07:37 +02:00
Andreas Rheinhardt	9ec928e627	avfilter/x86/Makefile: Fix standalone build of haldclut filter Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-05-07 23:53:26 +02:00
Andreas Rheinhardt	c11d7ca2f0	avfilter/x86/Makefile: Add missing dependencies for sobel filter Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-05-07 23:53:26 +02:00
Andreas Rheinhardt	790f793844	avutil/common: Don't auto-include mem.h There are lots of files that don't need it: The number of object files that actually need it went down from 2011 to 884 here. Keep it for external users in order to not cause breakages. Also improve the other headers a bit while just at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:43 +01:00
Henrik Gramner	782c4df28d	x86: Avoid using 'd' as an argument name x86inc.asm adds defines for <argument_name>{b,w,d,q} which clashes with the nasm d{b,w,d,q} pseudo-instructions for writing initialized data.	2024-03-24 14:53:57 +01:00
Andreas Rheinhardt	fa06f48371	avfilter/bwdifdsp: Constify Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Andreas Rheinhardt	80afcc8539	avfilter/bwdif: Add proper BWDIFDSPContext This already avoids unnecessary indirectly included headers in the arch-specific vf_bwdif_init.c files; it is also in preparation for splitting the actual functions out of vf_bwdif.c. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-28 00:17:47 +02:00
Paul B Mahol	c5effe7d3d	avfilter/x86/af_afir: add FMA3 SIMD	2023-09-17 11:11:24 +02:00
Evgeny Pavlov	cb1479faca	avfilter/vf_ssim: Fix x86 assembly code for SSIM calculation This commit fixes bug #10495 The code had several bugs related to post-loop compensation code: - test assembly instruction performs bitwise AND operation and generate flags used by jz branch instruction. Wrong test condition leads to incorrect branching - Incorrect compensation code for some branches Signed-off-by: Evgeny Pavlov <lucenticus@gmail.com>	2023-08-21 17:04:51 +02:00
James Almer	aca8ceb870	x86/vf_bwdif_init: limit AVX2 functions using 256bit vectors to cpus known to be fast with it Signed-off-by: James Almer <jamrial@gmail.com>	2023-03-25 13:27:20 -03:00
James Darnley	073ec3b9da	avfilter/bwdif: add avx2 filter_line function 8-bit: 2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 10-bit: 2.00x faster (1703±1.7 vs. 853±2.0 decicycles) compared with ssse3	2023-03-25 02:38:17 +01:00
James Darnley	b503b5a0cf	avfilter/bwdif: move filter_line init to a dedicated function	2023-03-25 02:38:17 +01:00
Lynne	bbe95f7353	x86: replace explicit REP_RETs with RETs From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.	2023-02-01 04:23:55 +01:00
Wang, Bin	459527108a	libavfilter/x86/vf_convolution: fix sobel swap issue on WIN64 Reviewed by: James Almer <jamrial@gmail.com> Signed-off-by: Wang, Bin <bin.wang@intel.com>	2022-11-21 12:28:25 +08:00
bwang30	3ab11dc5bb	libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter sobel_c: 4537 sobel_avx512icl 2136 Signed-off-by: bwang30 <bin.wang@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-11-14 10:04:16 +08:00
Paul B Mahol	00b03331a0	avfilter/vf_threshold: fix handling of zero threshold	2022-10-27 10:23:24 +02:00
Andreas Rheinhardt	ed42a51930	avfilter/x86/vf_bwdif: Remove obsolete MMXEXT functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:38:14 +02:00
Andreas Rheinhardt	7c3c1d938f	avfilter/x86/vf_idet: Remove obsolete MMX(EXT) functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:38:01 +02:00
Andreas Rheinhardt	4d7128be9a	avfilter/x86/vf_yadif: Remove obsolete MMXEXT functions The only system which benefit from these are truely ancient 32bit x86s as all other systems use at least the SSE2 versions (this includes all x64 cpus (which is why this code is restricted to x86-32)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:37:48 +02:00
Andreas Rheinhardt	77b2a422a0	avfilter/x86/vf_eq_init: Remove obsolete MMXEXT function x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from process_mmxext are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:31 +02:00
Andreas Rheinhardt	c5dd2fdc09	avfilter/x86/vf_noise: Remove obsolete MMX function x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from line_noise_mmx are truely ancient 32bit x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:32:08 +02:00
Andreas Rheinhardt	0df18f29ae	avfilter/af_afir: Only keep DSP stuff in header Only the AudioFIRDSPContext and the functions for its initialization are needed outside of lavfi/af_afir.c. Also rename the header to af_afirdsp.h to reflect the change. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Paul B Mahol	28d011516b	avfilter/x86/vf_limiter: use movu, dst may not be always aligned Happens with pad filter after limiter.	2022-03-24 09:44:09 +01:00
Marton Balint	5b3732227e	avfilter/x86/vf_blend: use unaligned movs for output Fixes crashes with: ffmpeg -f lavfi -i allyuv=d=1 -vf tblend=difference128,pad=5000:ih:1 -f null x Signed-off-by: Marton Balint <cus@passwd.hu>	2022-03-21 00:50:44 +01:00
Paul B Mahol	dae95b3ffd	avfilter/vf_maskedmerge: fix rounding when masking	2022-03-03 09:57:53 +01:00
Paul B Mahol	047c362d3c	avfilter/vf_nlmeans: add x86 SIMD	2021-11-11 21:54:46 +01:00

1 2 3 4 5 ...

359 Commits