ffmpeg

mirror of https://git.ffmpeg.org/ffmpeg.git synced 2025-12-31 19:30:11 +01:00

Author	SHA1	Message	Date
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Andreas Rheinhardt	9b409ea1e6	configure: Factor mpegvideoencdsp out of mpegvideoenc This will allow to relax the dependency on mpegvideoenc for several codecs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-06-21 22:08:52 +02:00
Andreas Rheinhardt	20ddada2a3	avcodec/pixblockdsp: Improve 8 vs 16 bit check Before this commit, the input in get_pixels and get_pixels_unaligned has been treated inconsistenly: - The generic code treated 9, 10, 12 and 14 bits as 16bit input (these bits correspond to what FFmpeg's dsputils supported), everything with <= 8 bits as 8 bit and everything else as 8 bit when used via AVDCT (which exposes these functions and purports to support up to 14 bits). - AARCH64, ARM, PPC and RISC-V, x86 ignore this AVDCT special case. - RISC-V also ignored the restriction to 9, 10, 12 and 14 for its 16bit check and treated everything > 8 bits as 16bit. - The mmi MIPS code treats everything as 8 bit when used via AVDCT (this is certainly broken); otherwise it checks for <= 8 bits. The msa MIPS code behaves like the generic code. This commit changes this to treat 9..16 bits as 16 bit input, everything else as 8 bit (the former because it makes sense, the latter to preserve the behaviour for external users). : The only internal user of AVDCT (the spp filter) always uses 8, 9 or 10 bits. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-05-31 01:25:27 +02:00
Zhao Zhili	26752368f0	aarch64/h26x: Add put_hevc_pel_bi_w_pixels On rpi5 (A76): put_hevc_pel_bi_w_pixels4_8_c: 90.0 ( 1.00x) put_hevc_pel_bi_w_pixels4_8_neon: 34.1 ( 2.64x) put_hevc_pel_bi_w_pixels6_8_c: 188.3 ( 1.00x) put_hevc_pel_bi_w_pixels6_8_neon: 73.5 ( 2.56x) put_hevc_pel_bi_w_pixels8_8_c: 327.1 ( 1.00x) put_hevc_pel_bi_w_pixels8_8_neon: 75.8 ( 4.32x) put_hevc_pel_bi_w_pixels12_8_c: 728.8 ( 1.00x) put_hevc_pel_bi_w_pixels12_8_neon: 186.1 ( 3.92x) put_hevc_pel_bi_w_pixels16_8_c: 1288.1 ( 1.00x) put_hevc_pel_bi_w_pixels16_8_neon: 268.5 ( 4.80x) put_hevc_pel_bi_w_pixels24_8_c: 2855.5 ( 1.00x) put_hevc_pel_bi_w_pixels24_8_neon: 723.8 ( 3.95x) put_hevc_pel_bi_w_pixels32_8_c: 5095.3 ( 1.00x) put_hevc_pel_bi_w_pixels32_8_neon: 1165.0 ( 4.37x) put_hevc_pel_bi_w_pixels48_8_c: 11521.5 ( 1.00x) put_hevc_pel_bi_w_pixels48_8_neon: 2856.0 ( 4.03x) put_hevc_pel_bi_w_pixels64_8_c: 21020.5 ( 1.00x) put_hevc_pel_bi_w_pixels64_8_neon: 4699.1 ( 4.47x) Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2025-04-29 15:24:14 +08:00
Zhao Zhili	39786f8cd5	aarch64/h26x: optimize sao_band_filter int8_t[] is enough for offset_table of 8 bit streams. On rpi5: Before After hevc_sao_band_8_8_c: 252.3 ( 1.00x) 252.3 ( 1.00x) hevc_sao_band_8_8_neon: 95.8 ( 2.63x) 61.0 ( 4.57x) hevc_sao_band_16_8_c: 875.2 ( 1.00x) 864.9 ( 1.00x) hevc_sao_band_16_8_neon: 317.5 ( 2.76x) 150.0 ( 6.26x) hevc_sao_band_32_8_c: 3853.5 ( 1.00x) 3871.6 ( 1.00x) hevc_sao_band_32_8_neon: 1222.3 ( 3.15x) 550.6 ( 7.39) hevc_sao_band_48_8_c: 8203.6 ( 1.00x) 8182.6 ( 1.00x) hevc_sao_band_48_8_neon: 2685.7 ( 3.05x) 1185.8 ( 7.36x) hevc_sao_band_64_8_c: 14023.0 ( 1.00x) 14038.9 ( 1.00x) hevc_sao_band_64_8_neon: 4783.2 ( 2.93x) 2078.4 ( 7.15x) Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2025-04-29 15:11:45 +08:00
Andreas Rheinhardt	a064d34a32	avcodec/mpegvideoenc: Add MPVEncContext Many of the fields of MpegEncContext (which is also used by decoders) are actually only used by encoders. Therefore this commit adds a new encoder-only structure and moves all of the encoder-only fields to it except for those which require more explicit synchronisation between the main slice context and the other slice contexts. This synchronisation is currently mainly provided by ff_update_thread_context() which simply copies most of the main slice context over the other slice contexts. Fields which are moved to the new MPVEncContext no longer participate in this (which is desired, because it is horrible and for the fields b) below wasteful) which means that some fields can only be moved when explicit synchronisation code is added in later commits. More explicitly, this commit moves the following fields: a) Fields not copied by ff_update_duplicate_context(): dct_error_sum and dct_count; the former does not need synchronisation, the latter is synchronised in merge_context_after_encode(). b) Fields which do not change after initialisation (these fields could also be put into MPVMainEncContext at the cost of an indirection to access them): lambda_table, adaptive_quant, {luma,chroma}_elim_threshold, new_pic, fdsp, mpvencdsp, pdsp, {p,b_forw,b_back,b_bidir_forw,b_bidir_back,b_direct,b_field}_mv_table, [pb]_field_select_table, mb_{type,var,mean}, mc_mb_var, {min,max}_qcoeff, {inter,intra}_quant_bias, ac_esc_length, the *_vlc_length fields, the q_{intra,inter,chroma_intra}_matrix{,16}, dct_offset, mb_info, mjpeg_ctx, rtp_mode, rtp_payload_size, encode_mb, all function pointers, mpv_flags, quantizer_noise_shaping, frame_reconstruction_bitfield, error_rate and intra_penalty. c) Fields which are already (re)set explicitly: The PutBitContexts pb, tex_pb, pb2; dquant, skipdct, encoding_error, the statistics fields {mv,i_tex,p_tex,misc,last}_bits and i_count; last_mv_dir, esc_pos (reset when writing the header). d) Fields which are only used by encoders not supporting slice threading for which synchronisation doesn't matter: esc3_level_length and the remaining mb_info fields. e) coded_score: This field is only really used when FF_MPV_FLAG_CBP_RD is set (which implies trellis) and even then it is only used for non-intra blocks. For these blocks dct_quantize_trellis_c() either sets coded_score[n] or returns a last_non_zero value of -1 in which case coded_score will be reset in encode_mb_internal(). Therefore no old values are ever used. The MotionEstContext has not been moved yet. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-03-26 04:08:33 +01:00
Krzysztof Pyrkosz	f9b8f30680	avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12} This patch replaces integer widening with halving addition, and multi-step "emulated" rounding shift with a single asm instruction doing exactly that. Benchmarks before and after: A78 avg_8_64x64_neon: 2686.2 ( 6.12x) avg_8_128x128_neon: 10734.2 ( 5.88x) avg_10_64x64_neon: 2536.8 ( 5.40x) avg_10_128x128_neon: 10079.0 ( 5.22x) avg_12_64x64_neon: 2548.2 ( 5.38x) avg_12_128x128_neon: 10133.8 ( 5.19x) avg_8_64x64_neon: 897.8 (18.26x) avg_8_128x128_neon: 3608.5 (17.37x) avg_10_32x32_neon: 444.2 ( 8.51x) avg_10_64x64_neon: 1711.8 ( 8.00x) avg_12_64x64_neon: 1706.2 ( 8.02x) avg_12_128x128_neon: 7010.0 ( 7.46x) A72 avg_8_64x64_neon: 5823.4 ( 3.88x) avg_8_128x128_neon: 17430.5 ( 4.73x) avg_10_64x64_neon: 5228.1 ( 3.71x) avg_10_128x128_neon: 16722.2 ( 4.17x) avg_12_64x64_neon: 5379.1 ( 3.51x) avg_12_128x128_neon: 16715.7 ( 4.17x) avg_8_64x64_neon: 2006.5 (10.61x) avg_8_128x128_neon: 9158.7 ( 8.96x) avg_10_64x64_neon: 3357.7 ( 5.60x) avg_10_128x128_neon: 12411.7 ( 5.56x) avg_12_64x64_neon: 3317.5 ( 5.67x) avg_12_128x128_neon: 12358.5 ( 5.58x) A53 avg_8_64x64_neon: 8327.8 ( 5.18x) avg_8_128x128_neon: 31631.3 ( 5.34x) avg_10_64x64_neon: 8783.5 ( 4.98x) avg_10_128x128_neon: 32617.0 ( 5.25x) avg_12_64x64_neon: 8686.0 ( 5.06x) avg_12_128x128_neon: 32487.5 ( 5.25x) avg_8_64x64_neon: 6032.3 ( 7.17x) avg_8_128x128_neon: 22008.5 ( 7.69x) avg_10_64x64_neon: 7738.0 ( 5.68x) avg_10_128x128_neon: 27813.8 ( 6.14x) avg_12_64x64_neon: 7844.5 ( 5.60x) avg_12_128x128_neon: 26999.5 ( 6.34x) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-03-07 15:51:20 +02:00
Zhao Zhili	3e9777dc75	aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12 Reduce binary size at the same time. The performance compared to clang -O3 is the same. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2025-03-04 17:01:58 +08:00
Zhao Zhili	5977bff569	aarch64/hevcdsp_idct_neon: Optimize idct dc clang does better than the assembly code before the patch, especially for small size: hevc_idct_4x4_dc_8_c: 11.2 ( 1.00x) hevc_idct_4x4_dc_8_neon: 15.5 ( 0.73x) hevc_idct_4x4_dc_10_c: 12.0 ( 1.00x) hevc_idct_4x4_dc_10_neon: 15.2 ( 0.79x) hevc_idct_8x8_dc_8_c: 13.2 ( 1.00x) hevc_idct_8x8_dc_8_neon: 18.2 ( 0.73x) hevc_idct_8x8_dc_10_c: 13.5 ( 1.00x) hevc_idct_8x8_dc_10_neon: 17.2 ( 0.78x) hevc_idct_16x16_dc_8_c: 41.8 ( 1.00x) hevc_idct_16x16_dc_8_neon: 37.8 ( 1.11x) hevc_idct_16x16_dc_10_c: 41.8 ( 1.00x) hevc_idct_16x16_dc_10_neon: 37.8 ( 1.11x) hevc_idct_32x32_dc_8_c: 130.2 ( 1.00x) hevc_idct_32x32_dc_8_neon: 132.2 ( 0.98x) hevc_idct_32x32_dc_10_c: 130.2 ( 1.00x) hevc_idct_32x32_dc_10_neon: 132.2 ( 0.98x) This patch basically clone what the compiler does, so the performance is the same. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2025-03-04 17:01:58 +08:00
Krzysztof Pyrkosz	71a91485fa	avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr This patch replaces blocks of instructions performing rounding and widening shifts with one-liners achieving the same result. Before and after on A78 dmvr_8_12x20_neon: 86.2 ( 6.90x) dmvr_8_20x12_neon: 94.8 ( 5.93x) dmvr_8_20x20_neon: 141.5 ( 6.50x) dmvr_12_12x20_neon: 158.0 ( 3.76x) dmvr_12_20x12_neon: 151.2 ( 3.73x) dmvr_12_20x20_neon: 247.2 ( 3.71x) dmvr_hv_8_12x20_neon: 423.2 ( 3.75x) dmvr_hv_8_20x12_neon: 434.0 ( 3.69x) dmvr_hv_8_20x20_neon: 706.0 ( 3.69x) dmvr_8_12x20_neon: 77.2 ( 7.70x) dmvr_8_20x12_neon: 66.5 ( 8.49x) dmvr_8_20x20_neon: 92.2 ( 9.90x) dmvr_12_12x20_neon: 80.2 ( 7.38x) dmvr_12_20x12_neon: 58.2 ( 9.59x) dmvr_12_20x20_neon: 90.0 (10.15x) dmvr_hv_8_12x20_neon: 369.0 ( 4.34x) dmvr_hv_8_20x12_neon: 355.8 ( 4.49x) dmvr_hv_8_20x20_neon: 574.2 ( 4.51x) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-03-04 10:35:31 +02:00
Krzysztof Pyrkosz	e8d4c55987	avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only a^2, b^2 and 2ab in each iteration and derive the latter parts from these three at the end. Before and after: A78 ac3_sum_square_bufferfly_int32_neon: 484.8 ( 2.00x) ac3_sum_square_bufferfly_int32_neon: 468.2 ( 2.08x) A72 ac3_sum_square_bufferfly_int32_neon: 793.6 ( 1.26x) ac3_sum_square_bufferfly_int32_neon: 527.3 ( 1.92x) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-03-02 01:17:53 +02:00
Krzysztof Pyrkosz	9fb97215df	avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon This change removes one extra floating point operation and simplifies load operations at the beginning of the loop by using dedicated register for each of the 5 pointers and interleaving it with calculations. The first case seems to be a bit slower, but the performance increase is substantial in the other two. A78 before: postfilter_15_neon: 1684.8 ( 4.23x) postfilter_512_neon: 1395.5 ( 5.10x) postfilter_1022_neon: 1357.0 ( 5.25x) After: postfilter_15_neon: 1742.2 ( 4.09x) postfilter_512_neon: 1169.8 ( 6.09x) postfilter_1022_neon: 1160.0 ( 6.12x) A72 before: postfilter_15_neon: 3144.8 ( 2.39x) postfilter_512_neon: 3141.2 ( 2.39x) postfilter_1022_neon: 3230.0 ( 2.33x) After: postfilter_15_neon: 2847.8 ( 2.64x) postfilter_512_neon: 2877.8 ( 2.61x) postfilter_1022_neon: 2837.2 ( 2.65x) x13s before: postfilter_15_neon: 1615.4 ( 2.61x) postfilter_512_neon: 963.1 ( 4.39x) postfilter_1022_neon: 963.6 ( 4.39x) After: postfilter_15_neon: 1749.6 ( 2.41x) postfilter_512_neon: 707.1 ( 5.97x) postfilter_1022_neon: 706.1 ( 5.99x) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-02-10 14:55:16 +02:00
Krzysztof Pyrkosz	83e4b068d9	avcodec/aarch64/aacencdsp: NEON implementation This patch supplies handwritten NEON code for AAC. The benchmarks below were collected by invoking these two commands on each of my boards, A78, A72 and Thinkpad x13s: 1) ./tests/checkasm/checkasm --test=aacencdsp --bench --runs=12 2) ./ffmpeg -y -t 10:00 -f lavfi -i sine /tmp/foo.aac (the first line is speed without the patch, second, with) - A78 abs_pow34_c: 4161.5 ( 1.00x) abs_pow34_neon: 3586.2 ( 1.16x) quant_bands_signed_c: 5548.0 ( 1.00x) quant_bands_signed_neon: 1126.8 ( 4.92x) quant_bands_unsigned_c: 3979.2 ( 1.00x) quant_bands_unsigned_neon: 800.2 ( 4.97x) size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed=71.6x size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed=82.3x - A72 abs_pow34_c: 15362.2 ( 1.00x) abs_pow34_neon: 15382.5 ( 1.00x) quant_bands_signed_c: 9926.5 ( 1.00x) quant_bands_signed_neon: 2467.8 ( 4.02x) quant_bands_unsigned_c: 5469.8 ( 1.00x) quant_bands_unsigned_neon: 2089.5 ( 2.62x) size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed=34.3x size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed=37.8 - x13s abs_pow34_c: 2413.4 ( 1.00x) abs_pow34_neon: 1796.2 ( 1.34x) quant_bands_signed_c: 2968.9 ( 1.00x) quant_bands_signed_neon: 675.6 ( 4.39x) quant_bands_unsigned_c: 2311.9 ( 1.00x) quant_bands_unsigned_neon: 477.1 ( 4.85x) size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed= 135x size= 5251KiB time=00:10:00.00 bitrate= 71.7kbits/s speed= 159x Signed-off-by: Martin Storsjö <martin@martin.st>	2025-01-28 10:44:40 +02:00
Janne Grunau	430c38f698	aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter This reduces the amount the horizontal filters read beyond the filter width to a consistent 1 pixel. The data is not used so this is usually not noticeable. It becomes a problem when the application allocates frame buffers only for the aligned picture size and the end of it is at a page boundary. This happens for picture sizes which are a multiple of the page size like 1280x640. The frame buffer allocation is based on its most likely done via mmap + MAP_ANONYMOUS so start and end of the buffer are page aligned and the previous and next page are not necessarily mapped. Under these conditions like seen by Firefox a read beyond the end of the buffer results in a segfault. After the over-read is reduced to a single pixel it's reasonable to use VP9's emulated edge motion compensation for this. Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1881185 Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2025-01-03 17:53:46 -05:00
Zhao Zhili	952508ae05	aarch64/vvc: Add apply_bdof Test on rpi 5 with gcc 12: apply_bdof_8_8x16_c: 7315.2 ( 1.00x) apply_bdof_8_8x16_neon: 1876.8 ( 3.90x) apply_bdof_8_16x8_c: 7170.5 ( 1.00x) apply_bdof_8_16x8_neon: 1752.8 ( 4.09x) apply_bdof_8_16x16_c: 14695.2 ( 1.00x) apply_bdof_8_16x16_neon: 3490.5 ( 4.21x) apply_bdof_10_8x16_c: 7371.5 ( 1.00x) apply_bdof_10_8x16_neon: 1863.8 ( 3.96x) apply_bdof_10_16x8_c: 7172.0 ( 1.00x) apply_bdof_10_16x8_neon: 1766.0 ( 4.06x) apply_bdof_10_16x16_c: 14551.5 ( 1.00x) apply_bdof_10_16x16_neon: 3576.0 ( 4.07x) apply_bdof_12_8x16_c: 7236.5 ( 1.00x) apply_bdof_12_8x16_neon: 1863.8 ( 3.88x) apply_bdof_12_16x8_c: 7316.5 ( 1.00x) apply_bdof_12_16x8_neon: 1758.8 ( 4.16x) apply_bdof_12_16x16_c: 14691.2 ( 1.00x) apply_bdof_12_16x16_neon: 3480.5 ( 4.22x)	2024-12-21 11:54:44 +08:00
Martin Storsjö	2bb00ef59c	aarch64: vvc: Fix building the dmvr_hv assembly with older MSVC versions Explicitly use ldur for unaligned offsets; newer versions of armasm64 implicitly convert ldr to ldur as necessary, but older versions require it explicitly written out. This fixes these build errors: ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2039) : error A2518: operand 2: Memory offset must be aligned ldr s5, [x1, #1] ffmpeg\libavcodec\aarch64\vvc\inter.o.asm(2250) : error A2518: operand 2: Memory offset must be aligned ldr d7, [x1, #2] Signed-off-by: Martin Storsjö <martin@martin.st>	2024-12-18 13:45:09 +02:00
Bin Peng	72a3656e84	lavc/aarch64: Fix ff_pred16x16_plane_neon_10 Fix test failure on aarch64: ./tests/checkasm/checkasm --test=h264pred 367840 Signed-off-by: Peng Bin <pengbin@visionular.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2024-12-17 14:50:29 +02:00
Bin Peng	decc9e643c	lavc/aarch64: Fix ff_pred8x8_plane_neon_10 Fix test failure on aarch64: ./tests/checkasm/checkasm --test=h264pred 479612 The mismatch between neon and C functions can also be reproduced using the following bitstream and command line. wget https://streams.videolan.org/ffmpeg/incoming/intra8x8pred_10bit.264 ./ffmpeg -cpuflags 0 -threads 1 -i intra8x8pred_10bit.264 -f framemd5 -y md5_ref ./ffmpeg -threads 1 -i intra8x8pred_10bit.264 -f framemd5 -y md5_neon Signed-off-by: Bin Peng <pengbin@visionular.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2024-12-17 14:50:29 +02:00
Zhao Zhili	40feba5f77	aarch64/vvc: Fix clip in alf Fix test failure: ./tests/checkasm/checkasm --test=vvc_alf 3607569773	2024-12-10 21:00:47 +08:00
Zhao Zhili	91436638de	aarch64/vvc: Use faster clip operation Replace sqxtn+smin+smax by sqxtun+umin.	2024-12-10 21:00:47 +08:00
Zhao Zhili	bfed5f6b7d	aarch64/vvc: Reuse ff_vvc_put_pel_pixels for chroma	2024-12-10 21:00:47 +08:00
Zhao Zhili	5988a2729b	aarch64/vvc: Add dmvr dmvr_8_12x20_c: 1.5 ( 1.00x) dmvr_8_12x20_neon: 0.2 ( 6.56x) dmvr_8_20x12_c: 1.0 ( 1.00x) dmvr_8_20x12_neon: 0.2 ( 4.33x) dmvr_8_20x20_c: 1.7 ( 1.00x) dmvr_8_20x20_neon: 0.5 ( 3.63x) dmvr_12_12x20_c: 2.2 ( 1.00x) dmvr_12_12x20_neon: 0.5 ( 4.68x) dmvr_12_20x12_c: 2.0 ( 1.00x) dmvr_12_20x12_neon: 0.5 ( 4.16x) dmvr_12_20x20_c: 3.7 ( 1.00x) dmvr_12_20x20_neon: 0.7 ( 5.14x) Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-10-01 10:28:54 +08:00
Zhao Zhili	bcd65ebd8f	aarch64/vvc: Add dmvr_hv dmvr_hv_8_12x20_c: 8.0 ( 1.00x) dmvr_hv_8_12x20_neon: 1.2 ( 6.62x) dmvr_hv_8_20x12_c: 8.0 ( 1.00x) dmvr_hv_8_20x12_neon: 0.9 ( 8.37x) dmvr_hv_8_20x20_c: 12.9 ( 1.00x) dmvr_hv_8_20x20_neon: 1.7 ( 7.62x) dmvr_hv_10_12x20_c: 7.0 ( 1.00x) dmvr_hv_10_12x20_neon: 1.7 ( 4.09x) dmvr_hv_10_20x12_c: 7.0 ( 1.00x) dmvr_hv_10_20x12_neon: 1.7 ( 4.09x) dmvr_hv_10_20x20_c: 11.2 ( 1.00x) dmvr_hv_10_20x20_neon: 2.7 ( 4.15x) dmvr_hv_12_12x20_c: 6.5 ( 1.00x) dmvr_hv_12_12x20_neon: 1.7 ( 3.79x) dmvr_hv_12_20x12_c: 6.5 ( 1.00x) dmvr_hv_12_20x12_neon: 1.7 ( 3.79x) dmvr_hv_12_20x20_c: 10.2 ( 1.00x) dmvr_hv_12_20x20_neon: 2.2 ( 4.64x) Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-10-01 10:28:54 +08:00
Zhao Zhili	0ba9e8d0d4	aarch64/vvc: Add w_avg w_avg_8_2x2_c: 0.0 ( 0.00x) w_avg_8_2x2_neon: 0.0 ( 0.00x) w_avg_8_4x4_c: 0.2 ( 1.00x) w_avg_8_4x4_neon: 0.0 ( 0.00x) w_avg_8_8x8_c: 1.2 ( 1.00x) w_avg_8_8x8_neon: 0.2 ( 5.00x) w_avg_8_16x16_c: 4.2 ( 1.00x) w_avg_8_16x16_neon: 0.8 ( 5.67x) w_avg_8_32x32_c: 16.2 ( 1.00x) w_avg_8_32x32_neon: 2.5 ( 6.50x) w_avg_8_64x64_c: 64.5 ( 1.00x) w_avg_8_64x64_neon: 9.0 ( 7.17x) w_avg_8_128x128_c: 269.5 ( 1.00x) w_avg_8_128x128_neon: 35.5 ( 7.59x) w_avg_10_2x2_c: 0.2 ( 1.00x) w_avg_10_2x2_neon: 0.2 ( 1.00x) w_avg_10_4x4_c: 0.2 ( 1.00x) w_avg_10_4x4_neon: 0.2 ( 1.00x) w_avg_10_8x8_c: 1.0 ( 1.00x) w_avg_10_8x8_neon: 0.2 ( 4.00x) w_avg_10_16x16_c: 4.2 ( 1.00x) w_avg_10_16x16_neon: 0.8 ( 5.67x) w_avg_10_32x32_c: 16.2 ( 1.00x) w_avg_10_32x32_neon: 2.5 ( 6.50x) w_avg_10_64x64_c: 66.2 ( 1.00x) w_avg_10_64x64_neon: 10.0 ( 6.62x) w_avg_10_128x128_c: 277.8 ( 1.00x) w_avg_10_128x128_neon: 39.8 ( 6.99x) w_avg_12_2x2_c: 0.0 ( 0.00x) w_avg_12_2x2_neon: 0.2 ( 0.00x) w_avg_12_4x4_c: 0.2 ( 1.00x) w_avg_12_4x4_neon: 0.0 ( 0.00x) w_avg_12_8x8_c: 1.2 ( 1.00x) w_avg_12_8x8_neon: 0.5 ( 2.50x) w_avg_12_16x16_c: 4.8 ( 1.00x) w_avg_12_16x16_neon: 0.8 ( 6.33x) w_avg_12_32x32_c: 17.0 ( 1.00x) w_avg_12_32x32_neon: 2.8 ( 6.18x) w_avg_12_64x64_c: 64.0 ( 1.00x) w_avg_12_64x64_neon: 10.0 ( 6.40x) w_avg_12_128x128_c: 269.2 ( 1.00x) w_avg_12_128x128_neon: 42.0 ( 6.41x) Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-10-01 10:28:54 +08:00
Martin Storsjö	a3ec1f8c6c	aarch64: h26x: Fix the indentation of one function Signed-off-by: Martin Storsjö <martin@martin.st>	2024-09-26 13:42:11 +03:00
Zhao Zhili	3f84d1d1fb	aarch64/vvc: Add avg avg_8_2x2_c: 0.2 ( 1.00x) avg_8_2x2_neon: 0.2 ( 1.00x) avg_8_4x4_c: 0.2 ( 1.00x) avg_8_4x4_neon: 0.2 ( 1.00x) avg_8_8x8_c: 0.9 ( 1.00x) avg_8_8x8_neon: 0.2 ( 5.29x) avg_8_16x16_c: 3.7 ( 1.00x) avg_8_16x16_neon: 0.7 ( 5.44x) avg_8_32x32_c: 14.9 ( 1.00x) avg_8_32x32_neon: 1.7 ( 8.91x) avg_8_64x64_c: 59.7 ( 1.00x) avg_8_64x64_neon: 6.9 ( 8.62x) avg_8_128x128_c: 254.7 ( 1.00x) avg_8_128x128_neon: 26.9 ( 9.46x) avg_10_2x2_c: 0.2 ( 1.00x) avg_10_2x2_neon: 0.2 ( 1.00x) avg_10_4x4_c: 0.2 ( 1.00x) avg_10_4x4_neon: 0.2 ( 1.00x) avg_10_8x8_c: 0.9 ( 1.00x) avg_10_8x8_neon: 0.2 ( 5.29x) avg_10_16x16_c: 3.4 ( 1.00x) avg_10_16x16_neon: 0.4 ( 8.06x) avg_10_32x32_c: 13.9 ( 1.00x) avg_10_32x32_neon: 1.9 ( 7.23x) avg_10_64x64_c: 54.2 ( 1.00x) avg_10_64x64_neon: 8.4 ( 6.43x) avg_10_128x128_c: 232.4 ( 1.00x) avg_10_128x128_neon: 30.9 ( 7.52x) avg_12_2x2_c: 0.0 ( 0.00x) avg_12_2x2_neon: 0.2 ( 0.00x) avg_12_4x4_c: 0.4 ( 1.00x) avg_12_4x4_neon: 0.2 ( 2.43x) avg_12_8x8_c: 0.7 ( 1.00x) avg_12_8x8_neon: 0.2 ( 3.86x) avg_12_16x16_c: 3.7 ( 1.00x) avg_12_16x16_neon: 0.4 ( 8.65x) avg_12_32x32_c: 13.7 ( 1.00x) avg_12_32x32_neon: 2.2 ( 6.29x) avg_12_64x64_c: 53.9 ( 1.00x) avg_12_64x64_neon: 7.7 ( 7.03x) avg_12_128x128_c: 270.9 ( 1.00x) avg_12_128x128_neon: 30.4 ( 8.90x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	1be5a2374f	aarch64/vvc: Add put_epel_hv On Apple M1: put_chroma_hv_8_4x4_c: 1.7 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.2 ( 7.67x) put_chroma_hv_8_8x8_c: 5.5 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 (11.53x) put_chroma_hv_8_16x16_c: 18.5 ( 1.00x) put_chroma_hv_8_16x16_neon: 1.5 (12.53x) put_chroma_hv_8_32x32_c: 72.5 ( 1.00x) put_chroma_hv_8_32x32_neon: 4.7 (15.34x) put_chroma_hv_8_64x64_c: 274.0 ( 1.00x) put_chroma_hv_8_64x64_neon: 18.5 (14.83x) put_chroma_hv_8_128x128_c: 1058.7 ( 1.00x) put_chroma_hv_8_128x128_neon: 75.2 (14.07x) On Android Pixel 8 Pro: put_chroma_hv_8_4x4_c: 1.2 ( 1.00x) put_chroma_hv_8_4x4_neon: 0.0 ( 0.00x) put_chroma_hv_8_4x4_i8mm: 0.2 ( 5.00x) put_chroma_hv_8_8x8_c: 4.0 ( 1.00x) put_chroma_hv_8_8x8_neon: 0.5 ( 8.00x) put_chroma_hv_8_8x8_i8mm: 0.5 ( 8.00x) put_chroma_hv_8_16x16_c: 15.2 ( 1.00x) put_chroma_hv_8_16x16_neon: 2.5 ( 6.10x) put_chroma_hv_8_16x16_i8mm: 2.2 ( 6.78x) put_chroma_hv_8_32x32_c: 61.0 ( 1.00x) put_chroma_hv_8_32x32_neon: 9.8 ( 6.26x) put_chroma_hv_8_32x32_i8mm: 8.5 ( 7.18x) put_chroma_hv_8_64x64_c: 229.5 ( 1.00x) put_chroma_hv_8_64x64_neon: 38.5 ( 5.96x) put_chroma_hv_8_64x64_i8mm: 34.0 ( 6.75x) put_chroma_hv_8_128x128_c: 919.8 ( 1.00x) put_chroma_hv_8_128x128_neon: 154.5 ( 5.95x) put_chroma_hv_8_128x128_i8mm: 140.0 ( 6.57x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	0dcf204e5d	aarch64/vvc: Add put_epel_h i8mm put_chroma_h_8_4x4_c: 0.4 ( 1.00x) put_chroma_h_8_4x4_neon: 0.0 ( 0.00x) put_chroma_h_8_4x4_i8mm: 0.1 ( 2.67x) put_chroma_h_8_8x8_c: 1.6 ( 1.00x) put_chroma_h_8_8x8_neon: 0.1 (11.00x) put_chroma_h_8_8x8_i8mm: 0.1 (11.00x) put_chroma_h_8_16x16_c: 6.9 ( 1.00x) put_chroma_h_8_16x16_neon: 1.1 ( 6.00x) put_chroma_h_8_16x16_i8mm: 0.7 (10.62x) put_chroma_h_8_32x32_c: 27.6 ( 1.00x) put_chroma_h_8_32x32_neon: 4.7 ( 5.95x) put_chroma_h_8_32x32_i8mm: 4.4 ( 6.28x) put_chroma_h_8_64x64_c: 116.2 ( 1.00x) put_chroma_h_8_64x64_neon: 19.1 ( 6.07x) put_chroma_h_8_64x64_i8mm: 17.1 ( 6.77x) put_chroma_h_8_128x128_c: 466.6 ( 1.00x) put_chroma_h_8_128x128_neon: 81.4 ( 5.73x) put_chroma_h_8_128x128_i8mm: 71.7 ( 6.51x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	41a1885f7a	aarch64/vvc: Add put_epel_h put_chroma_h_8_4x4_c: 0.2 ( 1.00x) put_chroma_h_8_4x4_neon: 0.2 ( 1.00x) put_chroma_h_8_8x8_c: 0.8 ( 1.00x) put_chroma_h_8_8x8_neon: 0.2 ( 3.00x) put_chroma_h_8_16x16_c: 3.8 ( 1.00x) put_chroma_h_8_16x16_neon: 0.8 ( 5.00x) put_chroma_h_8_32x32_c: 12.5 ( 1.00x) put_chroma_h_8_32x32_neon: 2.2 ( 5.56x) put_chroma_h_8_64x64_c: 47.0 ( 1.00x) put_chroma_h_8_64x64_neon: 8.8 ( 5.37x) put_chroma_h_8_128x128_c: 200.2 ( 1.00x) put_chroma_h_8_128x128_neon: 31.8 ( 6.31x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	260e1b4b62	aarch64/vvc: Add sad sad_8x16_c: 0.8 ( 1.00x) sad_8x16_neon: 0.2 ( 3.00x) sad_16x8_c: 0.5 ( 1.00x) sad_16x8_neon: 0.2 ( 2.00x) sad_16x16_c: 1.5 ( 1.00x) sad_16x16_neon: 0.2 ( 6.00x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	5ac6925803	aarch64/vvc: Add put_qpel_hv With Apple M1 (no i8mm): put_luma_hv_8_4x4_c: 2.2 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 3.00x) put_luma_hv_8_8x8_c: 7.0 ( 1.00x) put_luma_hv_8_8x8_neon: 0.8 ( 9.33x) put_luma_hv_8_16x16_c: 22.8 ( 1.00x) put_luma_hv_8_16x16_neon: 2.5 ( 9.10x) put_luma_hv_8_32x32_c: 84.8 ( 1.00x) put_luma_hv_8_32x32_neon: 9.5 ( 8.92x) put_luma_hv_8_64x64_c: 333.0 ( 1.00x) put_luma_hv_8_64x64_neon: 35.5 ( 9.38x) put_luma_hv_8_128x128_c: 1294.5 ( 1.00x) put_luma_hv_8_128x128_neon: 137.8 ( 9.40x) With Pixel 8 Pro: put_luma_hv_8_4x4_c: 5.0 ( 1.00x) put_luma_hv_8_4x4_neon: 0.8 ( 6.67x) put_luma_hv_8_4x4_i8mm: 0.2 (20.00x) put_luma_hv_8_8x8_c: 13.2 ( 1.00x) put_luma_hv_8_8x8_neon: 1.2 (10.60x) put_luma_hv_8_8x8_i8mm: 1.2 (10.60x) put_luma_hv_8_16x16_c: 44.2 ( 1.00x) put_luma_hv_8_16x16_neon: 4.5 ( 9.83x) put_luma_hv_8_16x16_i8mm: 4.2 (10.41x) put_luma_hv_8_32x32_c: 160.8 ( 1.00x) put_luma_hv_8_32x32_neon: 17.5 ( 9.19x) put_luma_hv_8_32x32_i8mm: 16.0 (10.05x) put_luma_hv_8_64x64_c: 611.2 ( 1.00x) put_luma_hv_8_64x64_neon: 68.0 ( 8.99x) put_luma_hv_8_64x64_i8mm: 62.2 ( 9.82x) put_luma_hv_8_128x128_c: 2384.8 ( 1.00x) put_luma_hv_8_128x128_neon: 268.8 ( 8.87x) put_luma_hv_8_128x128_i8mm: 245.8 ( 9.70x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	a0b52afd32	aarch64/vvc: Add put_qpel_vx put_luma_v_8_4x4_c: 1.0 ( 1.00x) put_luma_v_8_4x4_neon: 0.0 ( 0.00x) put_luma_v_8_8x8_c: 3.5 ( 1.00x) put_luma_v_8_8x8_neon: 0.5 ( 7.00x) put_luma_v_8_16x16_c: 13.8 ( 1.00x) put_luma_v_8_16x16_neon: 1.2 (11.00x) put_luma_v_8_32x32_c: 54.2 ( 1.00x) put_luma_v_8_32x32_neon: 5.0 (10.85x) put_luma_v_8_64x64_c: 217.5 ( 1.00x) put_luma_v_8_64x64_neon: 18.8 (11.60x) put_luma_v_8_128x128_c: 886.2 ( 1.00x) put_luma_v_8_128x128_neon: 74.0 (11.98x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	b051bc7cb8	aarch64/h26x: Remove duplicate b.eq instruction b.eq is added by calc_all after each calc.	2024-09-14 16:36:34 +08:00
Zhao Zhili	9f6c8eb412	aarch64/vvc: Add put_qpel_hx i8mm Benchmark on Android pixel 8 with -fno-vectorize put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_4x4_i8mm: 0.0 ( 0.00x) put_luma_h_8_8x8_c: 1.5 ( 1.00x) put_luma_h_8_8x8_neon: 0.5 ( 3.00x) put_luma_h_8_8x8_i8mm: 0.5 ( 3.00x) put_luma_h_8_16x16_c: 6.2 ( 1.00x) put_luma_h_8_16x16_neon: 2.0 ( 3.12x) put_luma_h_8_16x16_i8mm: 1.5 ( 4.17x) put_luma_h_8_32x32_c: 25.5 ( 1.00x) put_luma_h_8_32x32_neon: 9.0 ( 2.83x) put_luma_h_8_32x32_i8mm: 6.8 ( 3.78x) put_luma_h_8_64x64_c: 99.8 ( 1.00x) put_luma_h_8_64x64_neon: 35.2 ( 2.83x) put_luma_h_8_64x64_i8mm: 27.2 ( 3.66x) put_luma_h_8_128x128_c: 422.0 ( 1.00x) put_luma_h_8_128x128_neon: 138.5 ( 3.05x) put_luma_h_8_128x128_i8mm: 109.2 ( 3.86x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	25448d1716	aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w put_luma_pixels_8_4x4_c: 0.2 ( 1.00x) put_luma_pixels_8_4x4_neon: 0.2 ( 1.00x) put_luma_pixels_8_8x8_c: 0.7 ( 1.00x) put_luma_pixels_8_8x8_neon: 0.2 ( 3.22x) put_luma_pixels_8_16x16_c: 2.2 ( 1.00x) put_luma_pixels_8_16x16_neon: 0.2 ( 9.89x) put_luma_pixels_8_32x32_c: 8.2 ( 1.00x) put_luma_pixels_8_32x32_neon: 1.2 ( 6.71x) put_luma_pixels_8_64x64_c: 33.7 ( 1.00x) put_luma_pixels_8_64x64_neon: 2.5 (13.63x) put_luma_pixels_8_128x128_c: 145.5 ( 1.00x) put_luma_pixels_8_128x128_neon: 10.2 (14.23x) put_uni_pixels_luma_8_4x4_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_4x4_neon: 0.0 ( 0.00x) put_uni_pixels_luma_8_8x8_c: 0.5 ( 1.00x) put_uni_pixels_luma_8_8x8_neon: 0.2 ( 2.11x) put_uni_pixels_luma_8_16x16_c: 1.2 ( 1.00x) put_uni_pixels_luma_8_16x16_neon: 0.2 ( 5.44x) put_uni_pixels_luma_8_32x32_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_32x32_neon: 0.5 ( 6.26x) put_uni_pixels_luma_8_64x64_c: 3.0 ( 1.00x) put_uni_pixels_luma_8_64x64_neon: 1.7 ( 1.72x) put_uni_pixels_luma_8_128x128_c: 6.5 ( 1.00x) put_uni_pixels_luma_8_128x128_neon: 6.5 ( 1.00x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	20f2bf5530	aarch64/vvc: Add put_qpel_h_* and put_qpel_uni_h_* Just share hevc implementation. checkasm --test=vvc_mc --benchmark: put_luma_h_8_4x4_c: 0.2 ( 1.00x) put_luma_h_8_4x4_neon: 0.2 ( 1.00x) put_luma_h_8_8x8_c: 1.0 ( 1.00x) put_luma_h_8_8x8_neon: 0.2 ( 4.33x) put_luma_h_8_16x16_c: 3.2 ( 1.00x) put_luma_h_8_16x16_neon: 1.2 ( 2.63x) put_luma_h_8_32x32_c: 13.7 ( 1.00x) put_luma_h_8_32x32_neon: 4.0 ( 3.45x) put_luma_h_8_64x64_c: 48.2 ( 1.00x) put_luma_h_8_64x64_neon: 15.7 ( 3.07x) put_luma_h_8_128x128_c: 203.5 ( 1.00x) put_luma_h_8_128x128_neon: 62.0 ( 3.28x) put_uni_h_luma_8_4x4_c: 0.2 ( 1.00x) put_uni_h_luma_8_4x4_neon: 0.2 ( 1.00x) put_uni_h_luma_8_8x8_c: 1.5 ( 1.00x) put_uni_h_luma_8_8x8_neon: 0.2 ( 6.56x) put_uni_h_luma_8_16x16_c: 5.7 ( 1.00x) put_uni_h_luma_8_16x16_neon: 1.2 ( 4.67x) put_uni_h_luma_8_32x32_c: 24.0 ( 1.00x) put_uni_h_luma_8_32x32_neon: 4.7 ( 5.07x) put_uni_h_luma_8_64x64_c: 90.0 ( 1.00x) put_uni_h_luma_8_64x64_neon: 17.0 ( 5.30x) put_uni_h_luma_8_128x128_c: 357.7 ( 1.00x) put_uni_h_luma_8_128x128_neon: 67.5 ( 5.30x)	2024-09-14 16:36:34 +08:00
Zhao Zhili	46f07ce7d1	aarch64/hevc: Move epel/qpel to h26x directory So vvc can reuse the implementation.	2024-09-14 16:36:34 +08:00
Zhao Zhili	8beafb5656	aarch64/hevc: Simplify function prototypes by macro	2024-09-14 16:36:34 +08:00
Anton Khirnov	3f9ca51015	lavc/opus*: move to opus/ subdir	2024-09-02 11:56:53 +02:00
Ramiro Polla	6aafe61285	avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t	2024-09-01 13:42:30 +02:00
Zhao Zhili	4c0372281b	aarch64/vvc: Bind h26x/sao filter implementation to vvc Reviewed-by: Martin Storsjö <martin@martin.st>	2024-08-31 16:07:50 +08:00
Zhao Zhili	8cc10298a7	aarch64/hevc: Move sao to h26x directory So vvc can reuse the implementation. Reviewed-by: Martin Storsjö <martin@martin.st>	2024-08-31 16:07:43 +08:00
Ramiro Polla	8c203ea7c7	avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1 A55 A76 pix_norm1_c: 484.3 235.2 pix_norm1_neon: 193.8 ( 2.50x) 44.7 ( 5.26x) pix_norm1_dotprod: 91.8 ( 5.28x) 21.2 (11.09x)	2024-08-26 12:49:04 +02:00
Ramiro Polla	9f68a3712e	avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1 A55 A76 pix_norm1_c: 478.2 234.2 pix_norm1_neon: 188.2 ( 2.54x) 41.2 ( 5.68x) pix_sum_c: 304.2 244.0 pix_sum_neon: 77.2 ( 3.94x) 21.5 (11.35x)	2024-08-26 12:48:31 +02:00
Ramiro Polla	5c1c0325cd	avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16 checkasm --bench for Raspberry Pi 5 Model B Rev 1.0: sse_0_c: 241.5 sse_0_neon: 37.2 sse_0_dotprod: 22.2 vsse_4_c: 148.7 vsse_4_neon: 31.0 vsse_4_dotprod: 15.7	2024-08-17 15:31:48 +02:00
Martin Storsjö	4acb9b7d10	aarch64: vvc: Fix unnecessary extra spaces Signed-off-by: Martin Storsjö <martin@martin.st>	2024-07-23 16:04:28 +03:00
Martin Storsjö	99598629e8	aarch64: vvc: Consistently use # for immediate constants Signed-off-by: Martin Storsjö <martin@martin.st>	2024-07-23 15:24:37 +03:00
Martin Storsjö	400843151d	aarch64: vvc: Fix compilation of alf.S with MSVC 2022 17.7 and older Use the "ldur" instruction explicitly, instead of having the assembler implicitly convert "ldr" instructions to "ldur". This fixes build errors like these: libavcodec\aarch64\vvc\alf.o.asm(1023) : error A2518: operand 2: Memory offset must be aligned ldr q22, [x3, #24] libavcodec\aarch64\vvc\alf.o.asm(1024) : error A2518: operand 2: Memory offset must be aligned ldr q24, [x2, #24] libavcodec\aarch64\vvc\alf.o.asm(1393) : error A2518: operand 2: Memory offset must be aligned ldr q22, [x3, #24] libavcodec\aarch64\vvc\alf.o.asm(1394) : error A2518: operand 2: Memory offset must be aligned ldr q24, [x2, #24] Signed-off-by: Martin Storsjö <martin@martin.st>	2024-07-23 15:24:33 +03:00
Zhao Zhili	2d4ef304c9	avcodec/vvc: Add aarch64 neon optimization for ALF vvc_alf_filter_chroma_4x4_8_c: 3.0 vvc_alf_filter_chroma_4x4_8_neon: 1.0 vvc_alf_filter_chroma_4x4_10_c: 2.7 vvc_alf_filter_chroma_4x4_10_neon: 1.0 vvc_alf_filter_chroma_4x4_12_c: 2.7 vvc_alf_filter_chroma_4x4_12_neon: 1.0 vvc_alf_filter_chroma_8x8_8_c: 10.2 vvc_alf_filter_chroma_8x8_8_neon: 3.0 vvc_alf_filter_chroma_8x8_10_c: 10.0 vvc_alf_filter_chroma_8x8_10_neon: 2.5 vvc_alf_filter_chroma_8x8_12_c: 10.0 vvc_alf_filter_chroma_8x8_12_neon: 2.5 vvc_alf_filter_chroma_16x16_8_c: 41.7 vvc_alf_filter_chroma_16x16_8_neon: 11.2 vvc_alf_filter_chroma_16x16_10_c: 39.0 vvc_alf_filter_chroma_16x16_10_neon: 10.0 vvc_alf_filter_chroma_16x16_12_c: 40.2 vvc_alf_filter_chroma_16x16_12_neon: 10.2 vvc_alf_filter_chroma_32x32_8_c: 162.0 vvc_alf_filter_chroma_32x32_8_neon: 45.0 vvc_alf_filter_chroma_32x32_10_c: 155.5 vvc_alf_filter_chroma_32x32_10_neon: 39.5 vvc_alf_filter_chroma_32x32_12_c: 155.5 vvc_alf_filter_chroma_32x32_12_neon: 40.0 vvc_alf_filter_chroma_64x64_8_c: 646.0 vvc_alf_filter_chroma_64x64_8_neon: 175.5 vvc_alf_filter_chroma_64x64_10_c: 708.2 vvc_alf_filter_chroma_64x64_10_neon: 166.7 vvc_alf_filter_chroma_64x64_12_c: 619.2 vvc_alf_filter_chroma_64x64_12_neon: 157.2 vvc_alf_filter_chroma_128x128_8_c: 2611.5 vvc_alf_filter_chroma_128x128_8_neon: 698.2 vvc_alf_filter_chroma_128x128_10_c: 2470.0 vvc_alf_filter_chroma_128x128_10_neon: 616.0 vvc_alf_filter_chroma_128x128_12_c: 2531.5 vvc_alf_filter_chroma_128x128_12_neon: 620.2 vvc_alf_filter_luma_8x8_8_c: 25.2 vvc_alf_filter_luma_8x8_8_neon: 4.2 vvc_alf_filter_luma_8x8_10_c: 18.5 vvc_alf_filter_luma_8x8_10_neon: 4.0 vvc_alf_filter_luma_8x8_12_c: 19.0 vvc_alf_filter_luma_8x8_12_neon: 4.0 vvc_alf_filter_luma_16x16_8_c: 106.5 vvc_alf_filter_luma_16x16_8_neon: 16.2 vvc_alf_filter_luma_16x16_10_c: 75.2 vvc_alf_filter_luma_16x16_10_neon: 14.7 vvc_alf_filter_luma_16x16_12_c: 79.7 vvc_alf_filter_luma_16x16_12_neon: 14.7 vvc_alf_filter_luma_32x32_8_c: 400.5 vvc_alf_filter_luma_32x32_8_neon: 63.2 vvc_alf_filter_luma_32x32_10_c: 299.2 vvc_alf_filter_luma_32x32_10_neon: 57.7 vvc_alf_filter_luma_32x32_12_c: 299.2 vvc_alf_filter_luma_32x32_12_neon: 57.7 vvc_alf_filter_luma_64x64_8_c: 1602.5 vvc_alf_filter_luma_64x64_8_neon: 251.7 vvc_alf_filter_luma_64x64_10_c: 1197.0 vvc_alf_filter_luma_64x64_10_neon: 235.5 vvc_alf_filter_luma_64x64_12_c: 1220.2 vvc_alf_filter_luma_64x64_12_neon: 235.7 vvc_alf_filter_luma_128x128_8_c: 6570.2 vvc_alf_filter_luma_128x128_8_neon: 1007.7 vvc_alf_filter_luma_128x128_10_c: 4822.7 vvc_alf_filter_luma_128x128_10_neon: 936.2 vvc_alf_filter_luma_128x128_12_c: 4791.2 vvc_alf_filter_luma_128x128_12_neon: 938.5 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-22 21:09:56 +08:00
Anton Khirnov	e4601cc339	lavc/hevc*: move to hevc/ subdir	2024-06-04 11:46:27 +02:00

1 2 3 4 5 ...

451 Commits