Commit Graph

8 Commits

Author SHA1 Message Date
Lynne
d4966f0a74 ffv1enc_vulkan: limit parallelism based on VRAM, fallback to host memory 2024-11-26 14:14:16 +01:00
Lynne
5effac3b02 ffv1enc: expose ff_ffv1_encode_buffer_size
The function is quite important to ensure that the output
is always going to be sufficient, and it can change version to
version, so exposing it makes sense.
2024-11-26 14:14:15 +01:00
Lynne
d8f301cdf2 ffv1enc_vulkan: switch to receive_packet
This allows the encoder to fully saturate all queues the GPU
has, giving a good 10% in certain cases and resolutions.

This also improves error resilience if an allocation fails,
and properly cleans up after itself if it does.
2024-11-26 14:14:15 +01:00
Lynne
4fefc6e80c ffv1enc_vulkan: remove arbitrary limitation of the number of slices 2024-11-26 14:14:15 +01:00
Lynne
7c52dda55f hwcontext_vulkan: add support for AV_PIX_FMT_GBRP12/14/16 2024-11-26 14:14:12 +01:00
Lynne
eb536d97a0 ffv1enc_vulkan: support buffers larger than 4GiB
Unlike the software FFv1 encoder, none of our buffers are allocated by
FFmpeg, which supports at most 4GiB large allocations.

For really large sizes, the maximum size of the buffer can exceed 4GiB,
which the software encoder optimistically tries to allocate as 4GiB
in the hopes that the encoder will compress to under that amount.

We can just let Vulkan allocate us a larger buffer, and switch to
64-bit offsets.
2024-11-20 05:23:05 +01:00
Lynne
66093c5b94 ffv1enc_vulkan: restrict number of execution contexts to 1
This only leads to wasting memory in a single-threaded operation.
Limit this to 1 for now and leave a comment.
2024-11-18 20:04:24 +01:00
Lynne
ed2391d341 ffv1enc: add a Vulkan encoder
This commit implements a standard, compliant, version 3 and version 4
FFv1 encoder, entirely in Vulkan. The encoder is written in standard
GLSL and requires a Vulkan 1.3 supporting GPU with the BDA extension.

The encoder can use any amount of slices, but nominally, should use
32x32 slices (1024 in total) to maximize parallelism.

All features are supported, as well as all pixel formats.
This includes:
 - Rice
 - Range coding with a custom quantization table
 - PCM encoding

CRC calculation is also massively parallelized on the GPU.

Encoding of unaligned dimensions on subsampled data requires
version 4, or requires oversizing the image to 64-pixel alignment
and cropping out the padding via container flags.

Performance-wise, this makes 1080p real-time screen capture possible
at 60fps on even modest GPUs.
2024-11-18 07:54:22 +01:00