Forgotten in 70a7df049c.
Using the wrong scantable matters for codecs for which both scantables
can differ, namely the MPEG-4 decoder and the WMV1/2 codecs.
For WMV1 it can lead to wrong output in case the IDCT permutation
is FF_IDCT_PERM_PARTTRANS, because in this case the entries of
of the intra scantable's raster end are not always <= the corresponding
entries of the inter scantable's raster end when the former is
initialized via ff_wmv1_scantable[1] and the latter via ff_wmv1_scantable[0].
FF_IDCT_PERM_PARTTRANS is used iff the Neon IDCT is used (for both arm
and aarch64).* Said IDCT is not used during FATE, so that this issue
went unnoticed.
WMV2 uses the same scantables, but uses a custom IDCT
which always uses FF_IDCT_PERM_NONE for which the inter_scantable,
so that the output is always correct for it.
The scantable for MPEG-4 can change mid-stream (for the decoder),
but since c41818dc5d only the intra
scantable is updated, so that both scantables can get out of sync.
In such a case the unquantize intra functions could unquantize
an incorrect number of coefficients.
Using raster_end of the wrong scantable can also lead to an
unnecessarily large amount of coefficients unquantized.
*: FF_IDCT_PERM_SIMPLE and FF_IDCT_PERM_TRANSPOSE would also not work,
but they are not used at all by arm and aarch64.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This patch adds support for DeckLink SDK 14.3 and newer by using
the legacy interfaces in the header <DeckLinkAPI_v14_2_1.h>.
The missing QueryInterface implementations are also provided.
With --disable-asm, ARCH_X86_32 is set to 0, but we still build the
checkasm binary. Update the check so it is config.h agnostic.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Reading the spec for what this flag means, it copies the data verbatim, including any swizzling/tiling, this has two issues
1. the format may not be what ffmpeg expects elsewhere, as it is expecing normal pitch linear host memeory in `swf`
2. the size of the copied data may not match the size of buffer provided, causing heap buffer overflow
It seems like addition of this flag is an oversight as it seems to be for caching/backups of image data, just to be used with copying back to the GPU with the MEMCPY flag, which is *not* how its used in ffmpeg.
Additionally, set memoryRowLength as if it isn't set, it assumes pitch = width_in_bytes, which I don't think is necessarily the case
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
(x86/ops.c has already been put in X86ASM-OBJS.)
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.
This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.
Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This stores a small buffer in shared memory per decode thread (16 bytes),
which helps reduce the number of memory accesses.
The bitstream buffer is first aligned to a 4 byte boundary, so that the
buffer can be filled with a single memory request.
It's not guaranteed that the conversion filter name string will be
deduplicated to the same memory location. While this is common
optimization to do, we cannot rely on it always happening.
Fixes regression since 8b375b2ffd.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
ff_h264_luma_dc_dequant_idct_sse2() does not pass checkasm for certain
seeds, because the input to packssdw no longer fits into an int16_t,
leading to saturation, where the C code just truncates. I don't know
whether the spec contains provisions that ensure that valid input
must not exceed 16 bit or whether the such inputs (even if invalid)
can be triggered by the actual code and not only the test.
This commit adapts the behavior of the function to the C reference code
to fix the test. packssdw is avoided, instead the lower words are
directly transfered to GPRs to be written out. This has unfortunately
led to a slight performance regression here (14.5 vs 15.1 cycles).
Fixes issue #20835.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
pw_1 is currently loaded in both codepaths. Generate it earlier instead.
Gives tiny speedups (15 vs 14.5 cycles) and reduces codesize.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is ABI compliant and gives a tiny speedup here (and is 16B smaller).
Old benchmarks:
h264_luma_dc_dequant_idct_8_c: 33.2 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2: 16.0 ( 2.07x)
New benchmarks:
h264_luma_dc_dequant_idct_8_c: 33.0 ( 1.00x)
h264_luma_dc_dequant_idct_8_sse2: 15.0 ( 2.20x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only exported (i.e. cglobal) functions need it; stride is already
sign-extended when it reaches any of the internal functions used here,
so don't sign-extend again.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows increased internal precision.
In addition, we can introduce an offset to the DC coefficient
during the second IDCT step, to remove a per-element addition
in the output codepath.
Finally, by processing columns first we can remove the barrier
after loading coefficients.
Signed-off-by: averne <averne381@gmail.com>
When combining rotation with a FIT_ mode other than FIT_FILL, the fitting
logic was operating on the un-rotated rects, when it should have been
operating on the rotated (output) rects.
If the SRC_PATH variable contains certain characters (like a `:`, which may
happen when FATE is executed on Windows), the value for the `file` option is
broken, so `make fate-filter-drawvg-video` always fails.
The solution in this commit is to copy the `drawvg.lines` to the `tests/data`
directory (which already has temp files), so the value for `file` is a fixed
string with no problematic characters.
Signed-off-by: Ayose <ayosec@gmail.com>
Commit ba4b73c977 caused a regression in
the usage of avg_frame_rate to detect the frame rate of raw h264/hevc
bitstreams: after the commit, avg_frame_rate is always the value of the
-framerate option (which is set to 25 by default) instead of the actual
frame rate derived from the bitstream SPS/VPS NALUs.
This commit fixes the regression by setting the framerate codec
parameter to the value of the framerate option instead. After this
change, bitstreams without timing information will derive avg_frame_rate
from the -framerate option, while bitstreams with timing information
will derive avg_frame_rate from the bitstream itself.
The h264-bsf-dts2pts test now returns the correct frame durations for a
bitstream with a mix of single-field and double-field frames.
Signed-off-by: Gavin Li <git@thegavinli.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The supported YUV pixel formats were separated between planar
and semiplanar. This approach reduces the number of CUDA kernels
for all pixel formats.
This patch:
1. Adds support for YUV 4:2:2 planar and semi-planar formats:
yuv422p, yuv422p10, nv16, p210, p216
2. Implements new conversion structures and kernel definitions
for planar and semi-planar formats
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Add support for additional pixel formats in CUDA hardware context:
- Planar formats (yuv420p10, yuv422p, yuv422p10, yuv444p10)
- Semiplanar formats (nv16, p210, p216)
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
Playback to a decklink device with a newer version of the
DeckLink SDK (14.3) stalls because the driver code calls
IDeckLinkVideoFrame::QueryInterface, which is not
implemented by ffmpeg.
This patch implements decklink_frame::QueryInterface,
so that playback works with both older (12.x) and
newer (>= 14.3) drivers.
Note: The patch still does not allow the code to compile
with DeckLink SDK 14.3 or newer, as the API has changed.
seq_decode is used to ensure that a picture and all of its reference
pictures use the same SPS. Any time the SPS changes, seq_decode should
be incremented. Prior to this patch, seq_decode was incremented in
frame_context_setup, which is called after the SPS is potentially
changed in decode_sps. Should the decoder encounter an error between
changing the SPS and incrementing seq_decode, the SPS could be modified
while seq_decode was not incremented, which could lead to invalid
reference pictures and various downstream issues. By instead updating
seq_decode within the picture set manager, we ensure seq_decode and the
SPS are always updated in tandem.
When PARSER_FLAG_COMPLETE_FRAMES is set, opus_parse() calls
set_frame_duration even on flush (buf_size==0), which triggers
a spurious "Error parsing Opus packet header" at EOF.
Match streaming-path behavior by skipping duration parsing on empty buffers.
Fixes#20954