GPUs filter out denormals when reading floats via imageLoad. Denormals shouldn't
be present in general, but if they are, this is a lossless codec, and we have to
preserve them. This allows reading the exact values.
Sponsored-by: Sovereign Tech Fund
Test the five public functions not already covered by
tests/color_utils: av_csp_luma_coeffs_from_avcsp,
av_csp_primaries_desc_from_id, av_csp_primaries_id_from_desc,
av_csp_approximate_trc_gamma, and av_csp_approximate_eotf_gamma.
Iterates every AVCOL_SPC, AVCOL_PRI, and AVCOL_TRC value including
the extended ranges, round-trips primaries via desc_eq so the
canonical first-match (e.g. smpte170m for smpte240m) is accepted,
checks that a garbage desc returns AVCOL_PRI_UNSPECIFIED, and that
out-of-range enum values return NULL or 0.0 as documented. The
trc/eotf gamma values come from static lookup tables so the
floating point output is bitexact across platforms.
Coverage for libavutil/csp.c: 88.50% -> 94.46%
Test av_ambient_viewing_environment_alloc with and without the size
out-parameter, and av_ambient_viewing_environment_create_side_data.
Verifies the {0, 1} rational defaults set by get_defaults(),
write/read-back of the three AVRational fields, frame side data
attachment, and OOM paths via av_max_alloc.
Coverage for libavutil/ambient_viewing_environment.c: 60.00% -> 100.00%
The issue was that XV30 is a native 444 10-bit format, rather than
16-bits. This resulted in padding leaking into bits where it shouldn't.
Sponsored-by: Sovereign Tech Fund
When s is NULL in av_dynamic_hdr_smpte2094_app5_from_t35, that's not an
allocation error but just invalid API usage. If there is any allocation
failure beforehand that would lead to this, the caller has to check it,
like is already done in all usages of this function in FFmpeg itself.
This is aligned forward by an extra space, because it inheried the
incorrect alignment from the EXIF declaration above it (fixed in the
previous commit).
Signed-off-by: Leo Izen <leo.izen@gmail.com>
This commit re-aligns the declaration by removing extra whitespace
and fixes the comment above to have the correct acronym. It also
documents what the four magic bytes indicate.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Test av_ts_make_string with NOPTS, zero, positive, negative, and
INT64 boundary values, av_ts2str macro, av_ts_make_time_string2
with various timebases, and av_ts_make_time_string pointer
variant.
Coverage for libavutil/timestamp.c: 0.00% -> 100.00%
Test av_tdrdi_alloc with 1 and 3 displays, and the inline
av_tdrdi_get_display accessor. Verifies that the returned
pointer matches entries_offset + idx * entry_size, tests
write/read-back of display width exponent/mantissa and view ID
fields, and OOM paths via av_max_alloc.
Coverage for libavutil/tdrdi.c: 0.00% -> 100.00%
Test av_dynamic_hdr_vivid_alloc and
av_dynamic_hdr_vivid_create_side_data. Verifies zero defaults,
write/read-back of system_start_code, num_windows, and
color transform params (min/avg/var/max RGB), frame side
data attachment, and OOM paths via av_max_alloc.
Coverage for libavutil/hdr_dynamic_vivid_metadata.c: 0.00% -> 100.00%
Test av_buffer_alloc, av_buffer_allocz, av_buffer_create with
custom free callback, AV_BUFFER_FLAG_READONLY, av_buffer_ref,
av_buffer_is_writable, av_buffer_get_ref_count,
av_buffer_make_writable, av_buffer_realloc (including from NULL),
av_buffer_replace (including with NULL), av_buffer_pool
init/get/uninit cycle, av_buffer_pool_init2 with custom alloc
and pool_free callbacks, av_buffer_pool_buffer_get_opaque, and
OOM paths via av_max_alloc.
Coverage for libavutil/buffer.c: 0.00% -> 90.19%
Remaining uncovered lines are mutex init failures and
secondary allocation failure paths.
6f811ad never set VK_DEVICE_QUEUE_CREATE_INTERNALLY_SYNCHRONIZED_BIT_KHR
on the queues at vkCreateDevice. Without the flag, the driver doesn't
actually synchronizes queues.
Fixes: 6f811ad751 ("hwcontext_vulkan: implement internal queue synchronization")
vkGetDeviceQueue2 with flags = 0 is equivalent to vkGetDeviceQueue and
is available since Vulkan 1.1. Needed to support queues created with
non-zero VkDeviceQueueCreateFlags.
Fixes VUID-vkGetDeviceQueue-flags-01841 VVL error.
This adds a NEON-optimized function for computing 32x32 Sum of Absolute
Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2
implementations but AArch64 lacked equivalent coverage.
The implementation mirrors the existing sad8 and sad16 NEON functions,
employing a 4-row unrolled loop with UABAL and UABAL2 instructions for
efficient load-compute interleaving, and four 8x16-bit accumulators to
handle the wider 32-byte rows.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm:
sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup)
sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup)
sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup)
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
This function is exported, so has to abide by the ABI
and therefore issues emms since commit
5b85ca5317. Yet this is
expensive and using SSE2 instead improves performance.
Also avoid the initial zeroing and the last pointer
increment while just at it.
This removes the last usage of mmx from libavutil*.
Old benchmarks:
sad_8x8_0_c: 13.2 ( 1.00x)
sad_8x8_0_mmxext: 27.8 ( 0.48x)
sad_8x8_1_c: 13.2 ( 1.00x)
sad_8x8_1_mmxext: 27.6 ( 0.48x)
sad_8x8_2_c: 13.3 ( 1.00x)
sad_8x8_2_mmxext: 27.6 ( 0.48x)
New benchmarks:
sad_8x8_0_c: 13.3 ( 1.00x)
sad_8x8_0_sse2: 11.7 ( 1.13x)
sad_8x8_1_c: 13.8 ( 1.00x)
sad_8x8_1_sse2: 11.6 ( 1.20x)
sad_8x8_2_c: 13.2 ( 1.00x)
sad_8x8_2_sse2: 11.8 ( 1.12x)
Hint: Using two psadbw or one psadbw and movhps made no difference
in the benchmarks, so I chose the latter due to smaller codesize.
*: except if lavu provides avpriv_emms for other libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The multiplanar image with storage_bit enabled fails to be exported
to DMA-BUF on the QCOM turnip driver, thus triggering this double-free issue.
```
[Parsed_hwmap_2 @ 0xffff5c002a70] Configure hwmap vulkan -> drm_prime.
[hwmap @ 0xffff5c001180] Filter input: vulkan, 1920x1080 (0).
[AVHWFramesContext @ 0xffff5c004e00] Unable to export the image as a FD!
free(): double free detected in tcache 2
Aborted
```
Additionally, add back an av_unused attribute. Otherwise, the compiler
will complain about unused variables when CUDA is not enabled.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
ff_vk_find_struct returns const void *, so storing it in const void *drm_create_pnext
fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext
assigns const void * to void *, triggering the same warning at that line. The right
fix is a (void *) cast at the call site, same as done for buf_pnext.
Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in
try_export_flags to the DRM modifier path only: when has_mods is false the log
always printed mod[0]=0x0, which is misleading since no DRM modifier is involved.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
When mapping Vulkan Video frames to DMA-BUF, synchronize using an exportable
binary semaphore and sync_fd where supported. Submit a lightweight exec that
waits on each plane's timeline semaphore at the current value, signals a
SYNC_FD-exportable binary semaphore, then export with vkGetSemaphoreFdKHR.
Store that binary semaphore in AVVkFrameInternal and reuse it across maps
instead of creating and destroying each time: for
VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, copy transference means a
successful vkGetSemaphoreFdKHR unsignals the semaphore like a wait, so it can
be signaled again on the next map submit. If export is unavailable, fall back
to vkWaitSemaphores.
Moved drm_sync_sem destroy to vulkan_free_internal
Export dma-buf fds with GetMemoryFdKHR for each populated f->mem[i], iterating
up to the sw_format plane count instead of stopping at the image count, so
multi-memory bindings are not skipped. Describe DRM layers using
max(sw planes, image count) and query subresource layout with the correct
aspect and image index when one VkImage backs multiple planes. Reference the
source hw_frames_ctx on the mapped frame and close dma-buf fds on failure paths.
For DMA-BUF-capable pools, honor VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT
from format export queries when binding memory. With DRM modifiers and a
video profile in create_pnext, preserve caller usage and image flags instead of
overwriting them from generic supported_usage probing; use the modifier list
create info when probing export flags for modifier tiling.
Include VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR from the output frames
context's usage together with DST (fixes
VUID-VkVideoBeginCodingInfoKHR-slotIndex-07245) instead of adding DPB usage
only when !is_current.
In ff_vk_decode_add_slice, pass VkVideoProfileListInfoKHR (from the output
frames context's create_pnext) as the pNext argument to
ff_vk_get_pooled_buffer instead of the full create_pnext chain. In
ff_vk_frame_params, set tiling to OPTIMAL only when it is not already
DRM_FORMAT_MODIFIER_EXT. In ff_vk_decode_init, when the output pool's
create_pnext includes VkImageDrmFormatModifierListCreateInfoEXT, initialize the
DPB pool with that modifier-list pNext and DRM_FORMAT_MODIFIER_EXT tiling;
otherwise use VkVideoProfileListInfoKHR and OPTIMAL as before. When
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is unset, the output
and DPB pools cannot use different layouts or tiling, so the DPB pool must
match the output pool.
Also fix av_hwframe_map ioctl sync_fd export, multi-planar semaphore handling,
and related failure-path cleanup.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
SMPTE-2094-50 is an upcoming standard that is close to being
finalized.
Define a side data type for carrying this metadata. And add
functions for parsing and writing it. This is very similar to
the handling of HDR10+ metadata.
The spec is available here: https://github.com/SMPTE/st2094-50
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Test the integer math utility functions: av_gcd, av_rescale,
av_rescale_rnd (all rounding modes including PASS_MINMAX),
av_rescale_q, av_compare_ts, av_compare_mod, av_rescale_delta,
and av_add_stable. Includes large-value tests that exercise the
128-bit multiply path in av_rescale_rnd.
av_bessel_i0 is not tested since it uses floating point math
that is not bitexact across platforms.
Coverage for libavutil/mathematics.c: 0.00% -> 82.03%
Remaining uncovered lines are av_bessel_i0 (float, 23 lines)
and one edge case fallback in av_rescale_delta.
Test all public API functions: name/format round-trip lookups,
bytes_per_sample, is_planar, packed/planar conversions,
alt_sample_fmt, get_sample_fmt_string, samples_get_buffer_size,
samples_alloc, samples_alloc_array_and_samples, samples_copy,
and samples_set_silence. OOM error paths are exercised via
av_max_alloc().
Coverage for libavutil/samplefmt.c: 0.00% -> 95.28%
Remaining uncovered lines are the fill_arrays failure path
and the overlapping memmove branch in samples_copy.
Test the three public API functions: av_rc4_alloc, av_rc4_init,
and av_rc4_crypt. Verifies keystream output against RFC 6229
test vectors for 40, 56, 64, and 128-bit keys, encrypt/decrypt
round-trip, inplace operation, and the invalid key_bits error path.
Coverage for libavutil/rc4.c: 0.00% -> 100.00%
The main issue is that BGR formats only semi-exist in Vulkan. Unlike all
other formats, they require the user to manually remap the pixel order, and
are also forbidden from being written to without a format in shaders. The main
reason for this was conservative - Vulkan is supposed to work everywhere, including
platforms where there is no write-time remapping/swizzing support.
Sponsored-by: Sovereign Tech Fund