ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-31 03:10:03 +01:00

Author	SHA1	Message	Date
Lynne	70be2e2ae2	lavc/hwaccels: properly order list	2025-10-28 07:11:26 +01:00
Lynne	51843adfe5	vulkan/rangecoder: ifdef out encode and decode chunks There's little code sharing between them.	2025-10-28 07:11:26 +01:00
Lynne	3cd678506c	vulkan_decode: align images to the subsampling Normally, the Vulkan drivers handle this. But Vulkan decided "nah". This requires API users to crop out odd-numbered images with subsampling.	2025-10-28 07:11:26 +01:00
Lynne	5b388f2838	hwcontext_vulkan: remove unsupported/broken pixel formats We have no use for 14-bit pixel formats for now, so remove support for gray14, which was broken due to the LSB padding issue. Similarly YUVA at 10/12 bit was broken for the same reason.	2025-10-27 22:59:41 -03:00
Lynne	98ee3f6718	hwcontext_vulkan: fix planar 10 and 12-bit RGB formats using the new MSB formats	2025-10-27 22:59:41 -03:00
Lynne	41ecb203c5	hwcontext_vulkan: fix 3-plane 444 10 and 12-bit formats using the new MSB formats We previously tried to fudge this somehow, but the pixel formats are simply broken and we cannot use them without declaring them as MSB.	2025-10-27 22:59:41 -03:00
Lynne	471acedec2	hwcontext_vulkan: fix grayscale 10 and 12-bit formats using the new MSB formats	2025-10-27 22:59:41 -03:00
Kacper Michajłow	a6ccaa2eea	avcodec/d3d12va_encode_h264: remove unused variables Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-27 15:39:39 +01:00
Kacper Michajłow	d57de83352	avcodec/d3d12va_encode: fix format specifier for HRESULT Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-27 15:39:39 +01:00
David Rosca	a0a16f2ea4	cbs_vp9: Always update loop filter and segmentation from current frame Fixes decoding vp90-2-09-aq2, vp90-2-15-segkey_adpq, vp90-2-15-segkey and vp90-2-22-svc_1280x720_1 with Vulkan hwaccel. Fixes: `26a2a76346` ("cbs_vp9: Fix VP9 passthrough")	2025-10-27 13:44:03 +00:00
Andreas Rheinhardt	d01608e022	avcodec/proresdec: Remove unused hwaccel_last_picture_private ProRes is an intra-only codec. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-25 22:34:30 +02:00
averne	23df9d4172	avcodec/prores: add parser Introduce a basic parser for ProRes frame headers. This avoid having to decode an entire frame to extract codec information.	2025-10-25 19:56:44 +00:00
averne	98412edfed	lavc: add a ProRes Vulkan hwaccel Add a shader-based Apple ProRes decoder. It supports all codec features for profiles up to the 4444 XQ profile, ie.: - 4:2:2 and 4:4:4 chroma subsampling - 10- and 12-bit component depth - Interlacing - Alpha The implementation consists in two shaders: the VLD kernel does entropy decoding for color/alpha, and the IDCT kernel performs the inverse transform on color components. Benchmarks for a 4k yuv422p10 sample: - AMD Radeon 6700XT: 178 fps - Intel i7 Tiger Lake: 37 fps - NVidia Orin Nano: 70 fps	2025-10-25 19:54:13 +00:00
averne	3fd55d952e	avcodec/proresdec: save slice width parameter in codec context Save the log2_desired_slice_size_in_mb syntax element in the codec context. Required by the Vulkan hwaccel to compute slice widths and positions.	2025-10-25 19:54:13 +00:00
averne	987368ef25	avcodec/prores: adapt hwaccel code for slice-based accelerators In preparation for the Vulkan hwaccel. The existing hwaccel code was designed around videotoolbox, which ingests the whole frame bitstream including picture headers. This adapts the code to accomodate lower-level, slice-based hwaccels.	2025-10-25 19:54:13 +00:00
averne	9195af77eb	proresdec: allocate private memory for hwaccel pictures In preparation for the Vulkan hwaccel, which stores per-frame acceleration structures.	2025-10-25 19:54:13 +00:00
Kacper Michajłow	ddeec523a2	avcodec/x86/idctdsp: add restrict to match function pointer types Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:15 +02:00
Kacper Michajłow	49cfafefd4	avcodec/rectangle: use uintptr_t for integer pointer type Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:15 +02:00
Kacper Michajłow	0cd999266f	avcodec/put_bits: add explicit cast to suppress MSVC warning Supresses: warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) Also drop L, as shift will never exceed 31. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:15 +02:00
Kacper Michajłow	fe425d97d1	avcodec/cbs*: remove redundant const, it's already in typedef Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:15 +02:00
Kacper Michajłow	9ad20839fb	avcodec/pixblockdsp: be consistent about restrict use in ff_{get,diff}_pixels Suppresses warnings about function pointer mismatch. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:15 +02:00
Kacper Michajłow	c597d8cac1	avcodec/aacpsdsp: add restrict to function pointers to match declarations Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-25 01:01:14 +02:00
Kacper Michajłow	a2b47ccfbf	avcodec/{png,mov}enc: use EOTF gamma approximation for gAMA chunk This is how images encoded with specific transfer function should be viewed. Image viewers that doesn't support named trc metadata, will fallback to simple gAMA value and both of those cases should produce the same image apperance for the viewer. Fixes: https://github.com/mpv-player/mpv/issues/13438 Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-10-24 22:24:54 +00:00
Bin Peng	3115c0c0e6	lavc/aarch64: Fix addp overflow in ff_pred16x16_plane_neon_10 The mismatch between neon and C functions can be reproduced using the following bitstream and command line on aarch64 devices. wget https://streams.videolan.org/ffmpeg/incoming/replay_intra_pred_16x16.h264 ./ffmpeg -cpuflags 0 -threads 1 -i replay_intra_pred_16x16.h264 -f framemd5 -y md5_ref ./ffmpeg -threads 1 -i replay_intra_pred_16x16.h264 -f framemd5 -y md5_neon Signed-off-by: Bin Peng <pengbin@visionular.com>	2025-10-24 15:32:35 +00:00
cenzhanquan1	1120b3db30	avcodec/liblc3enc: support packed float (AV_SAMPLE_FMT_FLT) input. Previously, the LC3 encoder only accepted planar float (AV_SAMPLE_FMT_FLTP). This change extends support to packed float (AV_SAMPLE_FMT_FLT) by properly handling channel layout and sample stride. The pcm data pointer and stride are now calculated based on the sample format: for planar, use frame->data[ch]; for packed, use frame->data[0] with channel offset. The stride is set to 1 for planar and number of channels for packed layout. This enables encoding from common packed audio sources without requiring a prior planar conversion, improving usability and efficiency. Signed-off-by: cenzhanquan1 <cenzhanquan1@xiaomi.com>	2025-10-23 14:42:50 +00:00
cenzhanquan1	0eb572f080	avcodec/liblc3dec: support sample format negotiation and planar layout. 1. Adds support for respecting the requested sample format. Previously, the decoder always used AV_SAMPLE_FMT_FLTP. Now it checks if the caller requested a specific format via avctx->request_sample_fmt and honors that request when supported. 2. Improves planar/interleaved audio buffer handling. The decoding logic now properly handles both planar and interleaved sample formats by calculating the correct stride and buffer pointers based on the actual sample format. The changes include: - Added format mapping between AVSampleFormat and lc3_pcm_format - Implemented format selection logic in initialization. - Updated buffer pointer calculation for planar/interleaved data. - Maintained backward compatibility with existing behavior. Signed-off-by: cenzhanquan1 <cenzhanquan1@xiaomi.com>	2025-10-23 22:06:04 +08:00
Zhao Zhili	edf5b777c9	avcodec/vvc: fix false alarm of missing ref on RASL	2025-10-21 13:21:52 +00:00
Andreas Rheinhardt	05b8608c76	avcodec/x86/mpegvideoencdsp_init: Fix left shift of negative number Uncovered by UBSan when running the mpegvideoencdsp checkasm test. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-21 12:11:55 +02:00
Leo Izen	dc39a576ad	avcodec/pngenc: include EXIF buffer in max_packet_size When calculating the max size of an output PNG packet, we should include the size of a possible eXIf chunk that we may write. This fixes a regression since `d3190a64c3` as well as a bug that existed prior in the apng encoder since commit `4a580975d4`. Signed-off-by: Leo Izen <leo.izen@gmail.com>	2025-10-19 09:17:38 -04:00
Michael Niedermayer	388e6fb3be	avcodec/ffv1enc: Consider variation in slice sizes When splitting a 5 lines image in 2 slices one will be 3 lines and thus need more space Fixes: Assertion sc->slice_coding_mode == 0 failed at libavcodec/ffv1enc.c:1668 Fixes: 422811239/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-4933405139861504 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-10-19 01:37:26 +02:00
Michael Niedermayer	4666c1eed3	libavcodec/cbs_apv_syntax_template: limit tile to 2gb We do not support larger tiles as we use signed int Alternatively we can check this in apv_decode_tile_component() or init_get_bits*() or support bitstreams above 2gb length Fixes: init_get_bits() failure later Fixes: 421817631/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_APV_fuzzer-4957386534354944 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-10-19 01:32:42 +02:00
Araz Iusubov	d19b7c283c	avcodec/d3d12va_encode: D3D12 H264 encoding support This patch introduces hardware-accelerated H.264 encoding using Direct3D 12 Video API (D3D12VA).	2025-10-18 12:20:11 +00:00
Andreas Rheinhardt	ed007ad427	avcodec/x86/fpel: Port ff_put_pixels8_mmx() to SSE2 This has the advantage of not violating the ABI by using MMX registers without issuing emms; it e.g. allows to remove an emms_c from bink.c. This commit uses GP registers on Unix64 (there are not enough volatile registers to do likewise on Win64) which reduces codesize and is faster on some CPUs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-17 13:27:56 +02:00
Andreas Rheinhardt	d91b1559e0	avcodec/x86/me_cmp: Replace MMXEXT size 16 funcs by unaligned SSE2 funcs Snow calls some of the me_cmp_funcs with insufficient alignment for the first pointer (see get_block_rd() in snowenc.c); therefore SSE2 functions which really need this alignment don't get set for Snow and `542765ce3e` consequently didn't remove MMXEXT functions which are overridden by these SSE2 functions for normal codecs. For reference, here is a command line which would segfault if one simply used the ordinary SSE2 functions for Snow: ./ffmpeg -i mm-short.mpg -an -vcodec snow -t 0.2 -pix_fmt yuv444p \ -vstrict -2 -qscale 2 -flags +qpel -motion_est iter 444iter.avi This commit adds unaligned SSE2 versions of these functions and removes the MMXEXT ones. This in particular implies that sad 16x16 now never uses MMX which allows to remove an emms_c from ac3enc.c. Benchmarks (u means unaligned version): sad_0_c: 8.2 ( 1.00x) sad_0_mmxext: 10.8 ( 0.76x) sad_0_sse2: 6.2 ( 1.33x) sad_0_sse2u: 6.7 ( 1.23x) vsad_0_c: 44.7 ( 1.00x) vsad_0_mmxext (approx): 12.2 ( 3.68x) vsad_0_sse2 (approx): 7.8 ( 5.75x) vsad_4_c: 88.4 ( 1.00x) vsad_4_mmxext: 7.1 (12.46x) vsad_4_sse2: 4.2 (21.15x) vsad_4_sse2u: 5.5 (15.96x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-17 13:05:07 +02:00
Andreas Rheinhardt	69a700043d	avcodec/x86/me_cmp: Remove MMXEXT functions overridden by SSE2 The SSE2 function overriding them are currently only set if the SSE2SLOW flag is not set and if the codec is not Snow. The former affects only outdated processors (AMDs from before Barcelona (i.e. before 2007)) and is therefore irrelevant. Snow does not use the pix_abs function pointers at all, so this is also no obstacle. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-17 13:05:07 +02:00
Andreas Rheinhardt	20c4608af8	avcodec/x86/me_cmp: Add SSE2 sad 8,16 xy2 functions The new functions are faster than the existing exact functions, yet get beaten by the nonexact functions (they can avoid unpacking to words and back). The exact (slow) MMX functions have therefore been removed, which was actually beneficial size-wise (416B of new functions, 619B of functions removed). pix_abs_0_3_c: 216.8 ( 1.00x) pix_abs_0_3_mmx: 71.8 ( 3.02x) pix_abs_0_3_mmxext (approximative): 17.6 (12.34x) pix_abs_0_3_sse2: 23.5 ( 9.23x) pix_abs_0_3_sse2 (approximative): 9.9 (21.94x) pix_abs_1_3_c: 98.4 ( 1.00x) pix_abs_1_3_mmx: 36.9 ( 2.66x) pix_abs_1_3_mmxext (approximative): 9.2 (10.73x) pix_abs_1_3_sse2: 14.8 ( 6.63x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-17 13:05:07 +02:00
Michael Yang	20051ed3af	avcodec/vulkan_encode_av1: fix level index	2025-10-16 21:59:24 +00:00
Andreas Rheinhardt	f4a87d8ca4	avcodec/x86/mpegvideoencdsp_init: Use xmm registers in SSSE3 functions Improves performance and no longer breaks the ABI (by forgetting to call emms). Old benchmarks: add_8x8basis_c: 43.6 ( 1.00x) add_8x8basis_ssse3: 12.3 ( 3.55x) New benchmarks: add_8x8basis_c: 43.0 ( 1.00x) add_8x8basis_ssse3: 6.3 ( 6.79x) Notice that the output of try_8x8basis_ssse3 changes a bit: Before this commit, it computes certain values and adds the values for i,i+1,i+4 and i+5 before right shifting them; now it adds the values for i,i+1,i+8,i+9. The second pair in these lists could be avoided (by shifting xmm0 and xmm1 before adding both together instead of only shifting xmm0 after adding them), but the former i,i+1 is inherent in using pmaddwd. This is the reason that this function is not bitexact. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-15 08:55:13 +02:00
Andreas Rheinhardt	cffd029e98	avcodec/x86/mpegvideoencdsp_init: Don't use slow path unnecessarily The only requirement of this code (and essentially the pmulhrsw instruction) is that the scaled scale fits into an int16_t. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-15 08:55:13 +02:00
Andreas Rheinhardt	a24e0f536d	avcodec/x86/hpeldsp_init: Remove check for inline mmx Forgotten in `4c55724da8`. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-14 12:31:15 +02:00
Frank Plowman	b0c77e5a12	lavc/vvc: Store RefStruct references to referenced PSs/headers in slice This loosens the coupling between CBS and the decoder by no longer using CodedBitstreamH266Context (containing the most recently parsed PSs & PH) to retrieve the PSs & PH in the decoder. Doing so is beneficial in two ways: 1. It improves robustness to the case in which an AVPacket doesn't contain precisely one PU. 2. It allows the decoder parameter set manager to properly handle the case in which a single PU (erroneously) contains conflicting parameter sets. Signed-off-by: Frank Plowman <post@frankplowman.com>	2025-10-13 19:05:36 +01:00
Andreas Rheinhardt	31f0749cd4	avcodec/vp3: Optimize alignment check away when possible Check only on arches that need said check. (Btw: I do not see how h_loop_filter benefits from alignment at all and why h_loop_filter_unaligned exists.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:59:49 +02:00
Andreas Rheinhardt	5823ab347a	avcodec/vp3dsp: Remove unused flags parameter from ff_vp3dsp_init() No longer necessary now that the x86 loop filter functions are bitexact. Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:59:24 +02:00
Andreas Rheinhardt	e3ca57ae8f	avcodec/x86/vp3dsp: Port loop filters to SSE2 The old code operated on bytes and did lots of tricks due to their limited range; it did not completely succeed, which is why the old versions were not used when bitexact output was requested. In contrast, the new version is much simpler: It operates on signed 16 bit words whose range is more than sufficient. This means that these functions don't need a check for bitexactness (and can be used in FATE). Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been removed from checkasm): h_loop_filter_c: 29.8 ( 1.00x) h_loop_filter_mmxext: 32.2 ( 0.93x) h_loop_filter_unaligned_c: 29.9 ( 1.00x) h_loop_filter_unaligned_mmxext: 31.4 ( 0.95x) v_loop_filter_c: 39.3 ( 1.00x) v_loop_filter_mmxext: 14.2 ( 2.78x) v_loop_filter_unaligned_c: 38.9 ( 1.00x) v_loop_filter_unaligned_mmxext: 14.3 ( 2.72x) New benchmarks: h_loop_filter_c: 29.2 ( 1.00x) h_loop_filter_sse2: 28.6 ( 1.02x) h_loop_filter_unaligned_c: 29.0 ( 1.00x) h_loop_filter_unaligned_sse2: 26.9 ( 1.08x) v_loop_filter_c: 38.3 ( 1.00x) v_loop_filter_sse2: 11.0 ( 3.47x) v_loop_filter_unaligned_c: 35.5 ( 1.00x) v_loop_filter_unaligned_sse2: 11.2 ( 3.18x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-13 18:58:50 +02:00
Tong Wu	10e9672a8c	avcodec/d3d12va_encode: use macros to set QP range and max frame size Signed-off-by: Tong Wu <wutong1208@outlook.com>	2025-10-12 01:50:57 +00:00
Andreas Rheinhardt	36f92206bb	avcodec/x86/hpeldsp: Improve ff_{avg,put}_pixels8_xy2_ssse3() This SSSE3 function uses MMX registers (of course without emms at the end) and processes eight bytes of input by unpacking it into two MMX registers. This is very suboptimal given that one can just use XMM registers to process eight words. This commit switches them to using XMM registers. Old benchmarks: avg_pixels_tab[1][3]_c: 114.5 ( 1.00x) avg_pixels_tab[1][3]_ssse3: 43.6 ( 2.62x) put_pixels_tab[1][3]_c: 83.6 ( 1.00x) put_pixels_tab[1][3]_ssse3: 34.0 ( 2.46x) New benchmarks: avg_pixels_tab[1][3]_c: 115.3 ( 1.00x) avg_pixels_tab[1][3]_ssse3: 24.6 ( 4.69x) put_pixels_tab[1][3]_c: 83.8 ( 1.00x) put_pixels_tab[1][3]_ssse3: 19.7 ( 4.24x) Reviewed-by: Kieran Kunhya <kieran@kunhya.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-12 02:45:37 +02:00
Andreas Rheinhardt	4c55724da8	avcodec/x86/hpeldsp: Add ff_put_no_rnd_pixels8_xy2_ssse3() Given that one has to deal with 16 byte intermediates it is unsurprising that SSE2 wins against MMX; the MMX version has therefore been removed (as well as the now unused inline_asm.h). The new function is even 32B smaller than the old MMX one. Old benchmarks: put_no_rnd_pixels_tab[1][3]_c: 84.1 ( 1.00x) put_no_rnd_pixels_tab[1][3]_mmx: 41.1 ( 2.05x) New benchmarks: put_no_rnd_pixels_tab[1][3]_c: 84.0 ( 1.00x) put_no_rnd_pixels_tab[1][3]_ssse3: 22.1 ( 3.80x) Reviewed-by: Kieran Kunhya <kieran@kunhya.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-12 02:45:25 +02:00
Andreas Rheinhardt	f84e06026a	avcodec/x86/hpeldsp: Add SSE2 of {avg,put} no_rnd xy2 with blocksize 16 Also remove the now superseded MMX versions (the new functions have the exact same codesize as the removed ones). Old benchmarks: avg_no_rnd_pixels_tab[0][3]_c: 233.7 ( 1.00x) avg_no_rnd_pixels_tab[0][3]_mmx: 121.5 ( 1.92x) put_no_rnd_pixels_tab[0][3]_c: 171.4 ( 1.00x) put_no_rnd_pixels_tab[0][3]_mmx: 82.6 ( 2.08x) New benchmarks: avg_no_rnd_pixels_tab[0][3]_c: 233.3 ( 1.00x) avg_no_rnd_pixels_tab[0][3]_sse2: 45.0 ( 5.18x) put_no_rnd_pixels_tab[0][3]_c: 172.1 ( 1.00x) put_no_rnd_pixels_tab[0][3]_sse2: 40.9 ( 4.21x) Reviewed-by: Kieran Kunhya <kieran@kunhya.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-12 02:43:29 +02:00
Andreas Rheinhardt	ce9d181444	avcodec/mjpegdec: Remove unnecessary reloads Hint: The parts of this patch in decode_block_progressive() and decode_block_refinement() rely on the fact that GET_VLC returns -1 on error, so that it enters the codepaths for actually coded block coefficients. Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-11 08:20:42 +02:00
Andreas Rheinhardt	dad06a445f	avcodec/Makefile: Remove h263 decoder->mpeg4videodec.o dependency Also prefer using #if CONFIG_MPEG4_DECODER checks in order not to rely on DCE. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-11 07:51:01 +02:00

1 2 3 4 5 ...

52946 Commits