Files
ffmpeg/libavcodec
Martin Storsjö 9c8bc74c2b arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

                                     Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3

By skipping individual 4x16 or 4x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1

I.e. in general a very minor overhead for the full subpartition case due
to the additional loads and cmps, but a significant speedup for the cases
when we only need to process a small part of the actual input data.

In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
8x8 or 16x16 subpartitions respectively.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-11-30 23:54:07 +02:00
..
2016-11-18 10:32:57 +01:00
2016-11-13 18:44:00 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-06 12:03:20 -04:00
2016-11-14 19:38:20 +00:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-18 10:34:04 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-04-26 16:30:18 -04:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-11-18 10:35:04 +01:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-06-08 18:51:57 +02:00
2016-08-17 12:16:42 +02:00
2016-08-17 12:16:42 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-08 18:51:56 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-10-21 10:11:20 +02:00
2016-07-18 15:27:13 +02:00
2016-11-18 10:35:43 +01:00
2016-11-18 10:35:43 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-28 14:17:43 +03:00
2016-06-28 14:17:43 +03:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-07-23 08:27:29 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-07 22:42:00 +01:00
2016-06-20 15:45:51 -04:00
2016-05-05 10:48:34 +02:00
2016-11-30 13:44:05 +01:00
2016-05-04 18:16:21 +02:00
2016-05-03 15:45:10 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-10-02 19:35:12 +02:00
2016-05-04 18:16:21 +02:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-10-09 20:58:10 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-18 10:36:14 +01:00
2016-05-11 12:21:25 +02:00
2016-05-04 18:16:21 +02:00
2016-06-12 20:27:53 +02:00
2016-11-17 16:53:48 +01:00
2016-11-14 19:38:20 +00:00
2016-07-22 19:08:13 +02:00
2016-05-04 18:16:21 +02:00
2016-05-11 12:22:49 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-06-29 15:25:42 -04:00
2016-06-29 15:25:42 -04:00
2016-05-04 18:16:21 +02:00
2016-10-02 15:42:03 -04:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-09-28 10:01:52 +02:00
2016-05-04 18:16:21 +02:00
2016-11-25 21:42:33 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-13 18:44:01 +01:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-09-08 21:58:22 +01:00
2016-09-08 21:58:22 +01:00
2016-08-11 10:54:44 +02:00
2016-05-04 18:16:21 +02:00
2016-08-17 12:16:42 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-05-04 18:16:21 +02:00
2016-11-24 11:22:13 +01:00
2016-06-07 13:09:57 +02:00
2016-05-04 18:16:21 +02:00