ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-28 18:00:01 +01:00

Files

Zhao Zhili 6ce02bcc3a avcodec/aarch64/vvc: Optimize apply_bdof

Before this patch, prof_grad_filter calculate
gh[0], gh[1], gv[0], gv[1] and save them to stack.

derive_bdof_vx_vy load them from stack and calculate
gh[0] + gh[1], gv[0] + gv[1].

apply_bdof_min_block load them from stack and calculate
gh[0] - gh[1], gv[0] - gv[1]

This patch add bdof_grad_filter, which calculate gh[0] + gh[1],
gh[0] - gh[1], gv[0] + gv[1], gv[0] - gv[1], and save them to
stack, so derive_bdof_vx_vy and apply_bdof_min_block can use the
results directly.

prof_grad_filter is kept for reuse by other functions in the future.

Benchmark on rpi5 with gcc 12
                               Before               After
--------------------------------------------------------------------
apply_bdof_8_8x16_c:       |   7431.4 ( 1.00x)   |   7371.7 ( 1.00x)
apply_bdof_8_8x16_neon:    |   1175.4 ( 6.32x)   |   1036.3 ( 7.11x)
apply_bdof_8_16x8_c:       |   7182.2 ( 1.00x)   |   7201.1 ( 1.00x)
apply_bdof_8_16x8_neon:    |   1021.7 ( 7.03x)   |    879.9 ( 8.18x)
apply_bdof_8_16x16_c:      |  14577.1 ( 1.00x)   |  14589.3 ( 1.00x)
apply_bdof_8_16x16_neon:   |   2012.8 ( 7.24x)   |   1743.3 ( 8.37x)
apply_bdof_10_8x16_c:      |   7292.4 ( 1.00x)   |   7308.5 ( 1.00x)
apply_bdof_10_8x16_neon:   |   1156.3 ( 6.31x)   |   1045.3 ( 6.99x)
apply_bdof_10_16x8_c:      |   7112.4 ( 1.00x)   |   7214.4 ( 1.00x)
apply_bdof_10_16x8_neon:   |   1007.6 ( 7.06x)   |    904.8 ( 7.97x)
apply_bdof_10_16x16_c:     |  14363.3 ( 1.00x)   |  14476.4 ( 1.00x)
apply_bdof_10_16x16_neon:  |   1986.9 ( 7.23x)   |   1783.1 ( 8.12x)
apply_bdof_12_8x16_c:      |   7433.3 ( 1.00x)   |   7374.7 ( 1.00x)
apply_bdof_12_8x16_neon:   |   1155.9 ( 6.43x)   |   1040.8 ( 7.09x)
apply_bdof_12_16x8_c:      |   7171.1 ( 1.00x)   |   7376.3 ( 1.00x)
apply_bdof_12_16x8_neon:   |   1010.8 ( 7.09x)   |    899.4 ( 8.20x)
apply_bdof_12_16x16_c:     |  14515.5 ( 1.00x)   |  14731.5 ( 1.00x)
apply_bdof_12_16x16_neon:  |   1988.4 ( 7.30x)   |   1785.2 ( 8.25x)

2025-09-03 06:55:37 +00:00

h26x

aarch64/h26x: Add put_hevc_pel_bi_w_pixels

2025-04-29 15:24:14 +08:00

vvc

avcodec/aarch64/vvc: Optimize apply_bdof

2025-09-03 06:55:37 +00:00

aacencdsp_init.c

avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 10:44:40 +02:00

aacencdsp_neon.S

avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 10:44:40 +02:00

aacpsdsp_init_aarch64.c

…

aacpsdsp_neon.S

…

ac3dsp_init_aarch64.c

…

ac3dsp_neon.S

avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon

2025-03-02 01:17:53 +02:00

cabac.h

…

fdct.h

…

fdctdsp_init_aarch64.c

…

fdctdsp_neon.S

…

fmtconvert_init.c

…

fmtconvert_neon.S

…

h264chroma_init_aarch64.c

…

h264cmc_neon.S

…

h264dsp_init_aarch64.c

…

h264dsp_neon.S

…

h264idct_neon.S

…

h264pred_init.c

…

h264pred_neon.S

lavc/aarch64: Fix ff_pred16x16_plane_neon_10

2024-12-17 14:50:29 +02:00

h264qpel_init_aarch64.c

…

h264qpel_neon.S

…

hevcdsp_deblock_neon.S

…

hevcdsp_idct_neon.S

aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12

2025-03-04 17:01:58 +08:00

hevcdsp_init_aarch64.c

aarch64/h26x: Add put_hevc_pel_bi_w_pixels

2025-04-29 15:24:14 +08:00

hpeldsp_init_aarch64.c

…

hpeldsp_neon.S

…

idct.h

…

idctdsp_init_aarch64.c

…

idctdsp_neon.S

…

Makefile

configure: Factor mpegvideoencdsp out of mpegvideoenc

2025-06-21 22:08:52 +02:00

me_cmp_init_aarch64.c

avcodec/mpegvideoenc: Add MPVEncContext

2025-03-26 04:08:33 +01:00

me_cmp_neon.S

all: fix typos found by codespell

2025-08-03 13:48:47 +02:00

mpegaudiodsp_init.c

…

mpegaudiodsp_neon.S

…

mpegvideoencdsp_init.c

avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t

2024-09-01 13:42:30 +02:00

mpegvideoencdsp_neon.S

avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t

2024-09-01 13:42:30 +02:00

neon.S

…

neontest.c

…

opusdsp_init.c

lavc/opus*: move to opus/ subdir

2024-09-02 11:56:53 +02:00

opusdsp_neon.S

avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon

2025-02-10 14:55:16 +02:00

pixblockdsp_init_aarch64.c

avcodec/pixblockdsp: Improve 8 vs 16 bit check

2025-05-31 01:25:27 +02:00

pixblockdsp_neon.S

…

rv40dsp_init_aarch64.c

…

sbrdsp_init_aarch64.c

…

sbrdsp_neon.S

…

simple_idct_neon.S

…

synth_filter_init.c

…

synth_filter_neon.S

…

vc1dsp_init_aarch64.c

…

vc1dsp_neon.S

…

videodsp_init.c

…

videodsp.S

…

vorbisdsp_init.c

…

vorbisdsp_neon.S

…

vp8dsp_init_aarch64.c

…

vp8dsp_neon.S

…

vp8dsp.h

…

vp9dsp_init_10bpp_aarch64.c

…

vp9dsp_init_12bpp_aarch64.c

…

vp9dsp_init_16bpp_aarch64_template.c

…

vp9dsp_init_aarch64.c

…

vp9dsp_init.h

…

vp9itxfm_16bpp_neon.S

…

vp9itxfm_neon.S

…

vp9lpf_16bpp_neon.S

…

vp9lpf_neon.S

…

vp9mc_16bpp_neon.S

…

vp9mc_aarch64.S

…

vp9mc_neon.S

aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2025-01-03 17:53:46 -05:00