In constant FPS mode, it's possible for there no be *no* input active at
the current timestamp, even though there would be active inputs later in
the timeline (nb_active > 0). This would previously hit the AVERROR_BUG case.
With this change, we can instead output an empty frame.
Instead of having a single s->status field to track the most recently
EOF'd stream, track the number of active streams directly and only query
the most recent status at the end.
The reason this worked before is because we implicitly relied on the
assumption that `ok` would always be true until all streams are EOF, but
because it's possible to have gaps in the timeline when mixing multiple
streams, this is not always the case in principle.
In practice, this fixes a bug where the filter would early-exit (EOF)
too soon, when any input reached EOF and there is a gap in the timeline.
When using libplacebo to composite multiple streams with complex PTS
values, it's possible for there to be a "hole" in the output stream; in
particular in constant FPS mode. This commit refactors output_frame() to
allow it to handle the case of there being no active input.
A --enable-* option should not have a condition where it's enabled by some
other mean, as is the case here if swscale is enabled.
Signed-off-by: James Almer <jamrial@gmail.com>
This patch adds format handling code for the new operations. This entails
fully decoding a format to standardized RGB, and the inverse.
Handling it this way means we can always guarantee that a conversion path
exists from A to B without having to explicitly cover logic for each path;
and choosing RGB instead of YUV as the intermediate (as was done in swscale
v1) is more flexible with regards to enabling further operations such as
primaries conversions, linear scaling, etc.
In the case of YUV->YUV transform, the redundant matrix multiplication will
be canceled out anyways.
Because of the lack of an external ABI on low-level kernels, we cannot
directly test internal functions. Instead, we construct a minimal op chain
consisting of a read, the op to be tested, and a write.
The bigger complication arises from the fact that the backend may generate
arbitrary internal state that needs to be passed back to the implementation,
which means we cannot directly call `func_ref` on the generated chain. To get
around this, always compile the op chain twice - once using the backend to be
tested, and once using the reference C backend.
The actual entry point may also just be a shared wrapper, so we need to
be very careful to run checkasm_check_func() on a pseudo-pointer that will
actually be unique for each combination of backend and active CPU flags.
This covers most 8-bit and 16-bit ops, and some 32-bit ops. It also covers all
floating point operations. While this is not yet 100% coverage, it's good
enough for the vast majority of formats out there.
Of special note is the packed shuffle fast path, which uses pshufb at vector
sizes up to AVX512.
Provides a generic fast path for any operation list that can be decomposed
into a series of memcpy and memset operations.
25% faster than the x86 backend for yuv444p -> yuva444p
33% faster than the x86 backend for gray -> yuvj444p
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.
In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
This can turn any compatible sequence of operations into a single packed
shuffle, including packed swizzling, grayscale->RGB conversion, endianness
swapping, RGB bit depth conversions, rgb24->rgb0 alpha clearing and more.
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
Give users and developers a way to opt in to the new format conversion code,
and more code from the swscale rewrite in general, even while development is
still ongoing.
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.
There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
This behavior had no real justification and was just incredibly confusing,
since the in/out pointers passet to setup() did not match those passed to
run(), all for what is arguably an exception anyways (the palette setup).
This wrapping logic still considered any nonzero return from the ASM function
to be the overall result, but this is not true since the addition of
FF_ALPHA_TRANSPARENT.
Fix it by only early returning if FF_ALPHA_STRAIGHT is detected.
Fixes: 9b8b78a815
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20301#issuecomment-4802
Extract Orientation and export it as a display matrix if present, and set the
frame's metadata with the remaining Exif entries.
Signed-off-by: James Almer <jamrial@gmail.com>
In hybrid_fragmented mode, the first moov at start should contain
mvex tag, since it's fmp4. When writing the moov at the second time,
it's not fmp4 any more, so mvex should be skipped.
Set CODECAPI_AVLowLatencyMode when AV_CODEC_FLAG_LOW_DELAY is enabled on
the AVCodecContext. AVLowLatencyMode can acheive lower latency encoding
that may not be accessible using eAVScenarioInfo options alone.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Currently, the aacencdsp checkasm tests fails for many seeds,
if the C code has been built with x87 math. This happens because
the excess precision of x87 math can make it end up rounding
to a different integer, and the checkasm tests checks that the
output integers match exactly between C and assembly.
One such failing case is "tests/checkasm/checkasm --test=aacencdsp
41" when compiled with GCC. When compiled with Clang, the test
seed 21 produces a failure.
To avoid the issue, we need to limit the precision of intermediates
to their nominal float range, matching the assembly implementations.
This can be achieved when compiling with GCC, by just adding a single
cast.
To observe the effect of this cast, compile the following
snippet,
int cast(float a, float b) {
return (int)
#ifdef CAST
(float)
#endif
(a + b);
}
with "gcc -m32 -std=c17 -O2", with/without -DCAST. For x86_64
cases (without the "-m32"), the cast doesn't make any difference
on the generated code.
This cast would seem to not have any effect, as a binary expression
with float inputs also would have the type float.
However, if compiling with GCC with -fexcess-precision=standard,
the cast forces limiting the precision according to the language
standard here - according to the GCC docs [1]:
> When compiling C or C++, if -fexcess-precision=standard is
> specified then excess precision follows the rules specified in
> ISO C99 or C++; in particular, both casts and assignments cause
> values to be rounded to their semantic types (whereas -ffloat-store
> only affects assignments). This option is enabled by default for
> C or C++ if a strict conformance option such as -std=c99 or
> -std=c++17 is used.
Ffmpeg's configure scripts enables -std=c17 by default.
This only helps with GCC though - the cast doesn't make any
difference for Clang. (Although, upstream Clang seems to default
to SSE math, while Ubuntu provided Clang defaults to x87 math.)
Limiting the precision with Clang would require casting to volatile
float for both intermediates here - and that does have a code
generation effect on all architectures.
[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Using the built-in token seems to only partially work.
And in the latest Forgejo release the issue forcing the use of the
built-in token was fixed with:
https://codeberg.org/forgejo/forgejo/pulls/9025
Change the PCM table init to use compile-time #if instead of a runtime
`if (CONFIG_...)`. This makes sure code for disabled encoders never gets
built, without depending on compiler dead-code elimination.
Signed-off-by: YiboFang <fangyibo@xiaomi.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add the max_frame_size option to support setting max frame size in
bytes. Max frame size is the maximum cap in the bitrate algorithm per
each encoded frame.
Signed-off-by: Tong Wu <wutong1208@outlook.com>
strftime returns 0 in case of an empty output (e.g. %p format string with some
locales), there is no way to distinguish this from a buffer-too-small error
condition. So we must use some heuristics to handle this case, and not consume
INT_MAX RAM and falsely report a truncated output.
Signed-off-by: Marton Balint <cus@passwd.hu>