The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).
This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.
Before: Cortex A53 A72 A73
vp8_idct_add_neon: 79.7 67.5 65.0
After:
vp8_idct_add_neon: 67.7 64.8 66.7
Signed-off-by: Martin Storsjö <martin@martin.st>
The original arm version didn't do saturation here. This probably
doesn't make any difference for performance, but reduces the
differences.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes it similar to put_epel16_v6, and gives a large speedup
on Cortex A53, a minor speedup on A72 and a very minor slowdown on
A73.
Before: Cortex A53 A72 A73
vp8_put_epel16_h6v6_neon: 2211.4 1586.5 1431.7
After:
vp8_put_epel16_h6v6_neon: 1736.9 1522.0 1448.1
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.
Before: Cortex A7 A8 A9 A53 A72
vp8_put_epel16_h6v6_neon: 3058.0 2218.5 2459.8 2183.0 1572.2
After:
vp8_put_epel16_h6v6_neon: 2670.8 1934.2 2244.4 1729.4 1503.9
Signed-off-by: Martin Storsjö <martin@martin.st>
Even if NEON would be disabled, the init functions should be built
as they are called as long as ARCH_AARCH64 is set.
These functions are part of a generic DSP subsytem, not tied directly
to one decoder. (They should be built if the vp7 decoder is enabled,
even if the vp8 decoder is disabled.)
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous form also does seem to assemble on current tools,
but I think it might fail on some older aarch64 tools.
Signed-off-by: Martin Storsjö <martin@martin.st>
This also partially fixes assembling with MS armasm64 (via
gas-preprocessor).
The movrel macro invocations need to pass the offset via a separate
parameter. Mach-o and COFF relocations don't allow a negative
offset to a symbol, which is handled properly if the offset is passed
via the parameter. If no offset parameter is given, the macro
evaluates to something like "adrp x17, subpel_filters-16+(0)", which
older clang versions also fail to parse (the older clang versions
only support one single offset term, although it can be a parenthesis.
Signed-off-by: Martin Storsjö <martin@martin.st>
The "new" entry point actually has existed since OpenH264 1.4 in
2015 and is the the recommended decoding entry point.
The name of this function, DecodeFrameNoDelay, is rather backwards
considering that it doesn't return the latest decoded frame immediately,
but actually does proper delaying and reordering of frames.
Signed-off-by: Martin Storsjö <martin@martin.st>
Dav1dPictures contain more than one buffer reference, so we're forced to use the
API properly to free them all.
Signed-off-by: James Almer <jamrial@gmail.com>
The color fields were moved to another struct, and a way to propagate
timestamps and other input metadata was introduced, so the packet
fifo can be removed.
Add support for 12bit streams, an option to disable film grain, and
read the profile from the sequence header referenced by the ouput
picture instead of guessing based on output pix_fmt.
Signed-off-by: James Almer <jamrial@gmail.com>
Add VDENC(lowpower mode) support for QSV h264 and HEVC
It's an experimental function(like lowpower in vaapi) with
some limitations:
- CBR/VBR require HuC which should be explicitly loaded via i915
module parameter(i915.enable_guc=2 for linux kerner version >= 4.16)
- HEVC VDENC was supported >= ICE LAKE
use option "-low_power 1" to enable VDENC.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
libx264 does have a field for opaque data to pass along with frames
through the encoder, but it is a pointer, while the libavcodec
reordered_opaque field is an int64_t. Therefore, allocate an array
within the libx264 wrapper, where reordered_opaque values in flight
are stored, and pass a pointer to this array to libx264.
Update the public libavcodec documentation for the AVCodecContext
field to explain this usage, and add a codec capability that allows
detecting whether an encoder handles this field.
Signed-off-by: Martin Storsjö <martin@martin.st>
This reverts commit 662558f985.
The avcodec_parameters_to_context() call was freeing and reallocating
AVCodecContext->extradata, essentially taking ownership of it, which according
to the doxy is user owned. This is an API break and has produces crashes in
some library users like Firefox.
Revert until a better solution is found to internally propagate the filtered
extradata back into the decoder context.
Signed-off-by: James Almer <jamrial@gmail.com>
Currently qsv (m)jpeg encoding is broken.
Regression introducing by the commit(id: c1bcd3): fix async support,
which requires the minimum async_depth to be 1, instead previous zero.
But the default async_depth of qsv (m)jpeg encoding is still initialized
(mostly) as zero.
This patch also abviously improves qsv (m)jpeg encoding performance
due to the default async_depth is changed to 4.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Based on hevc_parser code. This prevents repeated unnecessary allocations
and frees on every packet processed by the bsf.
Reviewed-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes high memory usage and prevents over allocation of the frames via
proper unref.
Can be checked as:
-hwaccel qsv -c:v h264_qsv -i ../h264-conformance/CANL2_Sony_E.jsv -c:v
h264_qsv -b:v 2000k -y qsv.mp4
MSVC expands the preprocessor directives differently, making the
version check fail in the previous form.
Clang can warn about this with -Wexpansion-to-defined (not currently
enabled by default):
warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous version checks checked explicitly for the version
where the version define was added to the installed headers,
making an "#ifdef AACDECODER_LIB_VL0" enough. Now that we have
a need for more diverse version checks than this, convert all checks
to such checks.
Signed-off-by: Martin Storsjö <martin@martin.st>
When flushing the encoder, we now need to provide non-null buffer
parameters for everything, even if they are unused.
The encoderDelay parameter has been replaced by two, nDelay and
nDelayCore.
Signed-off-by: Martin Storsjö <martin@martin.st>