ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2026-02-14 10:25:41 +01:00

Author	SHA1	Message	Date
Georgii Zagoruiko	cdb14bc74d	configure: add detection of assembler support for SME All changes are made during development/testing of SVE/SME for ffmpeg (vvc). Tested on Apple M4	2025-12-09 21:38:38 +00:00
Andreas Rheinhardt	4baa5e638b	tests/checkasm/checkasm: Don't test 3dnow The last 3dnow functions have been removed in commit `5ef613bcb0`, so don't test it in checkasm. (This will affect only one test, namely scalarproduct_and_madd_int16 from lossless_audiodsp: It does not use an SSSE3 function when the 3dnow flag is set. So for old AMDs (which advertise support for 3dnow), said SSSE3 function is never tested. Now it will.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-09 03:03:07 +00:00
Andreas Rheinhardt	7bc35b8426	tests/checkasm/vp9dsp: Allow to run only a subset of tests Make it possible to run only a subset of the VP9 tests in addition to all of them (via the vp9dsp test). This reduces noise and speeds up testing. FATE continues to use vp9dsp. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-08 19:27:56 +01:00
Kacper Michajłow	6a14a93af5	checkasm/sw_xyz2rgb: fix function type Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-12-05 21:55:03 +00:00
Arpad Panyik	a13871ae19	checkasm: Add xyz12Torgb48le test Add checkasm coverage for the XYZ12LE to RGB48LE path via the ctx->xyz12Torgb48 hook. Integrate the test into the build and runner, exercise a variety of widths/heights, compare against the C reference, and benchmark when width is multiple of 4. This improves test coverage for the new function pointer in preparation for architecture-specific implementations in subsequent commits. Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>	2025-12-05 10:28:18 +00:00
Andreas Rheinhardt	4b6e40a298	avcodec/vp8dsp: Don't compile unused functions The width 16 epel functions never use four taps in any direction, so don't build said functions. Saves 4352B of .text and 89B of .text.unlikely here. : mx and my in vp8_mc_luma() are always even. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-04 15:17:37 +01:00
Andreas Rheinhardt	e3e3265034	tests/checkasm/mpegvideo_unquantize: Add missing const Fixes this test under UBSan: runtime error: call to function dct_unquantize_mpeg1_intra_c through pointer to incorrect function type 'void ()(struct MpegEncContext , short *, int, int)' I don't know how I could forget this. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 14:17:58 +01:00
Andreas Rheinhardt	c22c2c5e03	avcodec/mpegvideo: Port dct_unquantize_mpeg2_intra_mmx to SSE2 Benefits from wider registers. Benchmarks: dct_unquantize_mpeg2_intra_c: 228.2 ( 1.00x) dct_unquantize_mpeg2_intra_mmx: 28.2 ( 8.10x) dct_unquantize_mpeg2_intra_sse2: 18.4 (12.37x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:43 +01:00
Andreas Rheinhardt	581050a175	tests/checkasm: Add mpegvideo unquantize test This adds a test for the mpegvideo unquantize functions. It has been written in order to be able to easily bench these functions. It should be noted that the random input fed to the tested functions is not necessarily representative of the stuff actually occuring in the wild. So benchmarks should be taken with a grain of salt; but comparisons between two functions that do not depend on branch predictions are valid (the usecase for this is to port the x86 mmx functions to use xmm registers). During testing I have found a bug in the arm/aarch64 neon optimizations when using the LIBMPEG2 permutation (used by FF_IDCT_INT): The code seems to be based on the presumption that the remainder of the number of coefficients to process is always <= 4 mod 16. The test therefore sometimes fails for these arches. Hint: I am not certain that 16 bits are enough for the intermediate values of all the computations involved; e.g. both FLV and MPEG-4 escape values can go beyond that after the corresponding multiplications. The input in this test is nevertheless designed to fit into 16 bits. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-12-03 10:23:39 +01:00
Kacper Michajłow	17456c553e	tests/checkasm: fix check for 32-bit Windows build With --disable-asm, ARCH_X86_32 is set to 0, but we still build the checkasm binary. Update the check so it is config.h agnostic. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-11-30 22:07:39 +00:00
Andreas Rheinhardt	ada0a81577	avcodec/x86/h264_idct: Don't use MMX registers in ff_h264_luma_dc_dequant_idct_sse2 It is ABI compliant and gives a tiny speedup here (and is 16B smaller). Old benchmarks: h264_luma_dc_dequant_idct_8_c: 33.2 ( 1.00x) h264_luma_dc_dequant_idct_8_sse2: 16.0 ( 2.07x) New benchmarks: h264_luma_dc_dequant_idct_8_c: 33.0 ( 1.00x) h264_luma_dc_dequant_idct_8_sse2: 15.0 ( 2.20x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-30 00:15:43 +01:00
Andreas Rheinhardt	aabaab10d2	tests/checkasm: Test VP6DSP Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-27 12:10:34 +01:00
James Almer	191f7e4869	tests/checkasm/sw_ops: fix signed integer related UB when shifting values Fixes: src/tests/checkasm/sw_ops.c:441:34: runtime error: shift exponent 32 is too large for 32-bit type 'int' src/tests/checkasm/sw_ops.c:591:37: runtime error: shift exponent 32 is too large for 32-bit type 'int' Signed-off-by: James Almer <jamrial@gmail.com>	2025-11-21 18:40:58 +00:00
Andreas Rheinhardt	0d3a88e55f	tests/checkasm/mpegvideoencdsp: Test denoise_dct Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	06b0dae51b	avfilter/vf_fsppdsp: Constify Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	ddd74276f8	avfilter/x86/vf_fspp: Port ff_column_fidct_mmx() to SSE2 It gains a lot because it has to operate on eight words; it also saves 608B of .text here. Old benchmarks: column_fidct_c: 3365.7 ( 1.00x) column_fidct_mmx: 1784.6 ( 1.89x) New benchmarks: column_fidct_c: 3361.5 ( 1.00x) column_fidct_sse2: 801.1 ( 4.20x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	68b11cde82	tests/checkasm/vf_fspp: Add test for column_fidct Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	570f8fc6c9	tests/checkasm/vf_fspp: Test store_slice Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	52ba2ac7bd	avfilter/x86/vf_fspp: Port mul_thrmat to SSE2 This fixes an ABI violation, as mul_thrmat did not issue emms. It seems that this ABI violation could reach the user, namely if ff_get_video_buffer() fails. Notice that ff_get_video_buffer() itself could fail because of this, namely if the allocator uses floating point registers. On x64 (where GCC already used SSE2 in the C version) mul_thrmat_c: 4.4 ( 1.00x) mul_thrmat_mmx: 8.6 ( 0.52x) mul_thrmat_sse2: 4.4 ( 1.00x) On 32bit (where SSE2 is not known to be available): mul_thrmat_c: 56.0 ( 1.00x) mul_thrmat_sse2: 6.0 ( 9.40x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	70eb8a76a9	tests/checkasm: Add vf_fspp mul_thrmat test Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	c6efe1abda	avcodec/h264chroma: Move mc1 function to mpegvideo_dec.c It is only used by mpegvideo decoders (for lowres). It is also only used for bitdepth == 8, so don't build the bitdepth == 16 function at all any more. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-01 13:31:57 +01:00
Andreas Rheinhardt	0f105b96a3	avcodec/x86/hevc/idct: Port ff_hevc_idct_4x4_dc_{8,10,12}_mmxext to SSE2 Practically no change in benchmarks (and in codesize). hevc_idct_4x4_dc_8_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_8_mmxext: 6.9 ( 1.14x) hevc_idct_4x4_dc_8_sse2: 6.8 ( 1.15x) hevc_idct_4x4_dc_10_c: 7.9 ( 1.00x) hevc_idct_4x4_dc_10_mmxext: 6.9 ( 1.16x) hevc_idct_4x4_dc_10_sse2: 6.8 ( 1.16x) hevc_idct_4x4_dc_12_c: 7.8 ( 1.00x) hevc_idct_4x4_dc_12_mmxext: 7.0 ( 1.13x) hevc_idct_4x4_dc_12_sse2: 6.8 ( 1.15x) Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-30 08:56:45 +01:00
Andreas Rheinhardt	f4a87d8ca4	avcodec/x86/mpegvideoencdsp_init: Use xmm registers in SSSE3 functions Improves performance and no longer breaks the ABI (by forgetting to call emms). Old benchmarks: add_8x8basis_c: 43.6 ( 1.00x) add_8x8basis_ssse3: 12.3 ( 3.55x) New benchmarks: add_8x8basis_c: 43.0 ( 1.00x) add_8x8basis_ssse3: 6.3 ( 6.79x) Notice that the output of try_8x8basis_ssse3 changes a bit: Before this commit, it computes certain values and adds the values for i,i+1,i+4 and i+5 before right shifting them; now it adds the values for i,i+1,i+8,i+9. The second pair in these lists could be avoided (by shifting xmm0 and xmm1 before adding both together instead of only shifting xmm0 after adding them), but the former i,i+1 is inherent in using pmaddwd. This is the reason that this function is not bitexact. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-15 08:55:13 +02:00
Andreas Rheinhardt	ce499ebf96	tests/checkasm/mpegvideoencdsp: Add test for add_8x8basis Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-15 08:55:13 +02:00
Andreas Rheinhardt	31f0749cd4	avcodec/vp3: Optimize alignment check away when possible Check only on arches that need said check. (Btw: I do not see how h_loop_filter benefits from alignment at all and why h_loop_filter_unaligned exists.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:59:49 +02:00
Andreas Rheinhardt	5823ab347a	avcodec/vp3dsp: Remove unused flags parameter from ff_vp3dsp_init() No longer necessary now that the x86 loop filter functions are bitexact. Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:59:24 +02:00
Andreas Rheinhardt	e3ca57ae8f	avcodec/x86/vp3dsp: Port loop filters to SSE2 The old code operated on bytes and did lots of tricks due to their limited range; it did not completely succeed, which is why the old versions were not used when bitexact output was requested. In contrast, the new version is much simpler: It operates on signed 16 bit words whose range is more than sufficient. This means that these functions don't need a check for bitexactness (and can be used in FATE). Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been removed from checkasm): h_loop_filter_c: 29.8 ( 1.00x) h_loop_filter_mmxext: 32.2 ( 0.93x) h_loop_filter_unaligned_c: 29.9 ( 1.00x) h_loop_filter_unaligned_mmxext: 31.4 ( 0.95x) v_loop_filter_c: 39.3 ( 1.00x) v_loop_filter_mmxext: 14.2 ( 2.78x) v_loop_filter_unaligned_c: 38.9 ( 1.00x) v_loop_filter_unaligned_mmxext: 14.3 ( 2.72x) New benchmarks: h_loop_filter_c: 29.2 ( 1.00x) h_loop_filter_sse2: 28.6 ( 1.02x) h_loop_filter_unaligned_c: 29.0 ( 1.00x) h_loop_filter_unaligned_sse2: 26.9 ( 1.08x) v_loop_filter_c: 38.3 ( 1.00x) v_loop_filter_sse2: 11.0 ( 3.47x) v_loop_filter_unaligned_c: 35.5 ( 1.00x) v_loop_filter_unaligned_sse2: 11.2 ( 3.18x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:58:50 +02:00
Andreas Rheinhardt	5d9a392bce	tests/checkasm: Add VP3 loop filter test Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:58:50 +02:00
Andreas Rheinhardt	54598238e4	tests/checkasm: Add CAVS qpel test This test already uncovered a bug in the vertical qpel motion compensation code. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-08 20:40:08 +02:00
Andreas Rheinhardt	3e2d9b73c1	avcodec/h264qpel: Move Snow-only code to snow.c Blocksize 2 is Snow-only, so move all the code pertaining to it to snow.c. Also make the put array in H264QpelContext smaller -- it only needs three sets of 16 function pointers. This continues `6eb8bc4217` and `b0c91c2fba`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-07 18:06:40 +02:00
James Almer	95850f339e	tests/checkasm: add a test for dcadsp Signed-off-by: James Almer <jamrial@gmail.com>	2025-10-05 10:09:04 -03:00
Andreas Rheinhardt	6eb8bc4217	avcodec/h264qpel: Don't build unused 2x2 size funcs for bitdepths > 8 The 2x2 put functions are only used by Snow and Snow uses only the eight bit versions. The rest is dead code. Disabling it saved 41277B here. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	8820e2205c	tests/checkasm/hpeldsp: Use instruction-set independent height Otherwise the benchmark numbers are incomparable. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	9a0581fca0	tests/checkasm: Add qpeldsp checkasm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	ab7d1c64c9	avcodec/x86/h263_loopfilter: Port loop filter to SSE2 Old benchmarks: h263dsp.h_loop_filter_c: 41.2 ( 1.00x) h263dsp.h_loop_filter_mmx: 39.5 ( 1.04x) h263dsp.v_loop_filter_c: 43.5 ( 1.00x) h263dsp.v_loop_filter_mmx: 16.9 ( 2.57x) New benchmarks: h263dsp.h_loop_filter_c: 41.6 ( 1.00x) h263dsp.h_loop_filter_sse2: 28.2 ( 1.48x) h263dsp.v_loop_filter_c: 42.4 ( 1.00x) h263dsp.v_loop_filter_sse2: 15.1 ( 2.81x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-03 17:05:46 +00:00
Andreas Rheinhardt	a8a16c15c8	tests/checkasm/llviddsp: Use the same width for each cpuflag Otherwise the benchmark numbers would be incomparable nonsense. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-03 17:05:46 +00:00
Kacper Michajłow	d6cb0d2c2b	ALL: move av_unused to conform with standard requirement This is required placement by standard [[maybe_unused]] attribute, works the same for __attribute__((unused)). Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-09-26 16:15:46 +00:00
Andreas Rheinhardt	4e2ef29cba	tests/checkasm: Add hpeldsp checkasm Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-26 06:21:02 +02:00
Niklas Haas	00e05bcd68	tests/checkasm: add vf_idet checkasm	2025-09-21 11:02:41 +00:00
Andreas Rheinhardt	a35c91dc14	avfilter/vf_colordetect: Rename header to vf_colordetectdsp.h It is more in line with our naming conventions. Reviewed-by: Martin Storsjö <martin@martin.st> Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-16 18:22:24 +02:00
Timo Rothenpieler	0362cb3806	build: link with CXX when -lstdc++ on linker commandline	2025-09-14 11:45:11 +00:00
Andreas Rheinhardt	bc545bae3b	tests/checkasm/sw_ops: Avoid 1 << 32 It is UB. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-13 21:27:27 +02:00
Martin Storsjö	5a893c1806	checkasm: sw_ops: Avoid division by zero If we're invoked with range == UINT_MAX, we end up doing "rnd() % (UINT_MAX + 1)", which is equal to "rnd() % 0". On arm (on all platforms) and on MSVC i386, this ends up crashing at runtime. This fixes the crash.	2025-09-02 14:28:56 +03:00
Niklas Haas	5e6ffa0376	tests/checkasm: add checkasm tests for swscale ops Because of the lack of an external ABI on low-level kernels, we cannot directly test internal functions. Instead, we construct a minimal op chain consisting of a read, the op to be tested, and a write. The bigger complication arises from the fact that the backend may generate arbitrary internal state that needs to be passed back to the implementation, which means we cannot directly call `func_ref` on the generated chain. To get around this, always compile the op chain twice - once using the backend to be tested, and once using the reference C backend. The actual entry point may also just be a shared wrapper, so we need to be very careful to run checkasm_check_func() on a pseudo-pointer that will actually be unique for each combination of backend and active CPU flags.	2025-09-01 19:28:36 +02:00
Niklas Haas	8406c56b0c	tests/checkasm: generalize DEF_CHECKASM_CHECK_FUNC to floats We split the standard macro into its body (implementation) and declaration, and use a macro argument in place of the raw `memcmp` call, with the major difference that we now take the number of pixels to compare instead of the number of bytes (to match the signature of float_near_ulp_array).	2025-09-01 19:27:53 +02:00
Niklas Haas	faf62cbdf5	tests/checkasm: increase number of runs in between measurements Sometimes, when measuring very small functions, rdtsc is not accurate enough to get a reliable measurement. This increases the number of runs inside the inner loop from 4 to 32, which should help a lot. Less important when using the more precise linux-perf API, but still useful. There should be no user-visible change since the number of runs is adjusted to keep the total time spent measuring the same.	2025-09-01 19:27:53 +02:00
Zhao Zhili	6450e01446	checkasm/vf_colordetect: test non-aligned width	2025-09-01 15:35:16 +00:00
Henrik Gramner	10a061ba99	vp9: Add AVX-512ICL asm for 8bpc subpel mc	2025-08-28 12:45:52 +00:00
Niklas Haas	9b8b78a815	avfilter/vf_colordetect: detect fully opaque alpha planes It can be useful to know if the alpha plane consists of fully opaque pixels or not, in which case it can e.g. safely be stripped. This only requires a very minor modification to the AVX2 routines, adding an extra AND on the read alpha value with the reference alpha value, and a single extra cheap test per line. detect_alpha_8_full_c: 2849.1 ( 1.00x) detect_alpha_8_full_avx2: 260.3 (10.95x) detect_alpha_8_full_avx512icl: 130.2 (21.87x) detect_alpha_8_limited_c: 8349.2 ( 1.00x) detect_alpha_8_limited_avx2: 756.6 (11.04x) detect_alpha_8_limited_avx512icl: 364.2 (22.93x) detect_alpha_16_full_c: 1652.8 ( 1.00x) detect_alpha_16_full_avx2: 236.5 ( 6.99x) detect_alpha_16_full_avx512icl: 134.6 (12.28x) detect_alpha_16_limited_c: 5263.1 ( 1.00x) detect_alpha_16_limited_avx2: 797.4 ( 6.60x) detect_alpha_16_limited_avx512icl: 400.3 (13.15x)	2025-08-18 18:50:00 +00:00
Niklas Haas	ae3c5ac2c1	avfilter/vf_colordetect: remove extra safety margin on premul check This safety margin was motivated by the fact that vf_premultiply sometimes produces such illegally high values, but this has since been fixed by `603334a043`, so there's no more reason to have this safety margin, at least for our own code. (Of course, other sources may also produce such broken files, but we shouldn't work around that - garbage in, garbage out.) See-Also: `603334a043`	2025-08-18 18:50:00 +00:00

1 2 3 4 5 ...

670 Commits