Some of the blend mode functions only depend on the underlying type
and therefore need only one version for 9, 10, 12, 14, 16 bits.
This saved 35104B with GCC and 26880B with Clang.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The blend functions currently convert strides from bytes to elements
of the type by using the stride /= sizeof(pixel) idiom. Yet this has
several drawbacks:
1. It invokes undefined behavior that happens to work when stride is
negative: size_t is typically the unsigned type of ptrdiff_t and
therefore the division will be performed as size_t, i.e. use logical
right shifts, making stride very big when sizeof(pixel) is > 1. This
works, because pointer to pixel for accesses entails an implicit
factor of sizeof(pixel) so that everything is correct modulo SIZE_MAX.
Yet this is UB and UBSan complains about it.
2. It makes the compiler emit actual shifts/ands to discard the low bits
shifted away.
3. There may be systems where alignof(uint16_t) or alignof(float) is
strictly smaller than their sizeof, so that the stride (in bytes) is
not guaranteed to be multiple of these sizeofs. In this case, dividing
by sizeof(pixel) is simply wrong.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows choosing whether the `fit_mode` merely controls the placement
of the image within the output resolution, or whether the output resolution
is also adjusted according to the given `fit_mode`.
The semantics of these keywords are well-defined by the CSS 'object-fit'
property. This is arguably more user-friendly and less obtuse than the
existing `normalize_sar` and `pad_crop_ratio` options. Additionally, this
comes with two new (useful) behaviors, `none` and `scale_down`, neither of
which map elegantly to the existing options.
One additional benefit of this option is that, unlike `normalize_sar`, it
does *not* also imply `reset_sar`; meaning that users can now choose to
have an anamorphic base layer and still have the overlay images scaled to fit
on top of it according to the chosen strategy.
See-Also: https://drafts.csswg.org/css-images/#the-object-fit
This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 version
of filter_line.
This commit therefore removes the overridden MMXEXT version
(which didn't abide by the ABI) which allows us to remove
an emms_c() from vf_gradfun.c, so that users with SSSE3 no longer
pay a price for the mere existence of an MMXEXT version.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Blending onto independent alpha framebuffers is not possible under the
constraints of the supported blend operators. While we could handle
blending premul-onto-premul, this would break if the base layer is YUV,
since premultiplied alpha does not survive the (nonlinear) YUV conversion.
Fortunately, blending independent-onto-premul is just as easy, and works in
all cases. So just force this mode when using a linear intermediate blend
texture, which is always RGBA.
Instead of directly mutating `opts->params`. Avoids any possible leak of
overriden params between invocations of this function, as well as the later
`pl_render_image` during the linear output pass.
To avoid pulling in the entire libavfilter when using the DSP functions
from checkasm.
The rest of the struct is not needed outside vf_idet.c and was moved there.
Depending on the threading backend the stdlib uses, creating a
mutex/condvar can be quite expensive.
So keep this object alive in the ctx, on which we synchronize via the
uninit mutex anyway.
Apparently, using a normal frame pool in a multithreaded environment
leads to strange resource leaks on shutdown, which vanish when using a
free threaded pool.
Add a new flag to the vf_colorspace filter which provides the user an
option to clamp the linear and delinear transfer characteristics LUT
values to the [0, 1] represented range. This helps constrain the
potential value range when converting between colorspaces.
Certain colors when going through the conversion can result in out of
gamut colors after the rotation. The colorspace filter allows that with
the extended range. The added clamping just keeps the colors within the
[0, 1) range rather than using that extended range. I'm not enough of a
color scientist to say which is correct, but there are certain
situations where we would prefer to keep the colors in gamut.
The example I have is:
A solid color image of 8-bit YUV: Y=157, U=164, V=98.
Specify the input as:
Input range: MPEG
In color matrix: BT470BG
In color primaries: BT470M
In color transfer characteristics: Gamma 28
Output as:
Out color range: JPEG
Out color matrix: BT.709
Out color primaries: BT.709
Out color transfer characteristics: BT.709
During the calculation you get:
Input YUV: y=157, u=164, v-98
Post-yuv2rgb BT.470BG: r=0.456055, g=0.684152, b=0.928606
Post-apply gamma28 linear LUT: r=0.110979, g=0.345494, b=0.812709
Post-color rotation BT.470M to BT.709: r=-0.04161, g=0.384626, b=0.852400
Post-apply Rec.709 delinear LUT: r=-0.16382, g=0.615932, b=0.923793
Post-rgb2yuv Rec.709 matrix: y=120, u=190, v=25
Where with this change, the delinear LUT output would be clamped to 0,
so the result would be:
r=0.000000, g=0.612390, b=0.918807 and a final output of
y=129, u=185, v=46
As for the long and av_clip64, this was just because lrint returned a
long, so I left it as that and then used av_clip64 to the [0,1) range to
avoid overflow. But re-reading, it looks like av_clip_int16 would
downcast that long to int anyway so the possibility of overflow already
existed there. I've put it back to int just to match the existing
behavior.