Change the encoding of the original developer name from ISO-8859-1 to UTF-8.
Remove the stale/completed TODO list.
Fix two small typos.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Certain alpha run lengths (for SHQ1/SHQ3/SHQ5) could be stored in
both long and short versions, and we would only accept the short version,
returning -1 (invalid code) for the others. This could cause an
out-of-bounds write on malicious input, as discovered by
Andreas Cadhalpun during fuzzing.
Fix by simply allowing both versions, leaving no invalid codes
in the alpha VLC.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Multichannel joint stereo simply interleaves stereo pairs (6ch: 2ch + 2ch + 2ch), so each pair is decoded separatedly.
***
To test my changes, I converted examples to wav with ffmpeg.exe (old and new), and compared them to see they are byte-exact.
Regular 2ch files (JS and normal) were straightforward to test.
For multichannel, to check each JS pair is correctly decoded separatedly I did:
- manually demux 6ch.msf into 3 pairs and convert them (2ch_1.wav + 2ch_2.wav + 2ch_3.wav)
- convert the 6ch.msf file to wav (with my changes)
- manually demux the 6ch.wav into 3 pairs (6ch_d1.wav + 6ch_d2.wav + 6ch_d3.wav)
- compare each pair (ex. 2ch_3.wav vs 6ch_d3.wav): all pairs are byte-exact.
The new code just processes each JS pair separatedly, there are no algorithm changes.
It could be improved a bit but I'm not sure about typical styles.
I've only seen 6ch .MSF (probably the AT3 spec only supports 2ch audio).
Signed-off-by: bnnm <bananaman255@gmail.com>
Fixes: u263_b-frames_1.avi
Fixes part of Ticket1536
return -1 is used here as it is used in similar code in this function, I intend
to replace it by proper error codes in the whole function.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit 'a1f6a2dfdaf9beb42ca66e49d10bfaf5905a0128':
ratecontrol: Reorder functions to avoid forward declarations
Merged, but this seems to break the clear separation of 1-pass vs
2-pass.
Merged-by: Clément Bœsch <u@pkh.me>
* commit 'd639dcdae022130078c9c84b7b691c5e9694786c':
ratecontrol: Move Xvid-related functions to the place they are actually used
Merged-by: Clément Bœsch <u@pkh.me>
The code relies on their validity and otherwise can try to access a NULL
object->rle pointer, causing segmentation faults.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
The assumption this is based on is wrong, the code is not always run with bitexact flags
This reverts commit a956164e1e, reversing
changes made to f6005907fd.
Approved-by: James Almer <jamrial@gmail.com>
* commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65':
x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate
Merged-by: James Almer <jamrial@gmail.com>
* commit '4efab89332ea39a77145e8b15562b981d9dbde68':
x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate
Merged-by: James Almer <jamrial@gmail.com>
* commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553':
x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code
Merged-by: James Almer <jamrial@gmail.com>
* commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5':
x86: hpeldsp: Split off VP3-specific bits into a separate file
Merged-by: James Almer <jamrial@gmail.com>
* commit 'fca3c3b61952aacc45e9ca54d86a762946c21942':
hevc: Add AVX2 DC IDCT
Mostly noop as we already have that code.
In the ASM, code is merged with the exception of SECTION which is kept
uppercase for consistency with the rest of the codebase.
Still in the ASM, the prototype comment is fixed to honor the '_' added
from the original commit.
idct_dc_proto() is dropped as it's not used anymore here.
Merged-by: Clément Bœsch <cboesch@gopro.com>
* commit 'cc16da75c2f99d92f7a6461100f041352deb6d88':
hevc: Add coefficient limiting to speed up IDCT
Noop again as we have these changes already, only random spacing
changes.
Merged-by: Clément Bœsch <cboesch@gopro.com>
* commit '4f247de3b797cdc9d243d26534412f81c306e5b5':
hevcdsp_template: Templatize IDCT
This commit is a noop as we already have that code from a previous
commits (see 92cccb7bcd).
Spacing is adjusted to reduce the diff.
Merged-by: Clément Bœsch <cboesch@gopro.com>
* commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d':
hevc: Separate adding residual to prediction from IDCT
This commit should be a noop but isn't because of the following renames:
- transform_add → add_residual
- transform_skip → dequant
- idct_4x4_luma → transform_4x4_luma
Merged-by: Clément Bœsch <cboesch@gopro.com>
This work is sponsored by, and copyright, Google.
This is similar to the arm version, but due to the larger registers
on aarch64, we can do 8 pixels at a time for all filter sizes.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM AArch64
vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6
vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2
vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5
vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7
vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0
vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0
vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0
vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0
vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0
vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0
vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0
vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0
vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2
vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5
vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2
vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7
This is significantly faster than the ARM version in almost
all cases except for the mix2 functions.
Based on START_TIMER/STOP_TIMER wrapping around a few individual
functions, the speedup vs C code is around 2-3x.
Signed-off-by: Martin Storsjö <martin@martin.st>
This work is sponsored by, and copyright, Google.
Compared to the arm version, on aarch64 we can keep the full 8x8
transform in registers, and for 16x16 and 32x32, we can process
it in slices of 4 pixels instead of 2.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM AArch64
vp9_inv_adst_adst_4x4_sub4_add_10_neon: 111.0 109.7
vp9_inv_adst_adst_8x8_sub8_add_10_neon: 914.0 733.5
vp9_inv_adst_adst_16x16_sub16_add_10_neon: 5184.0 3745.7
vp9_inv_dct_dct_4x4_sub1_add_10_neon: 65.0 65.7
vp9_inv_dct_dct_4x4_sub4_add_10_neon: 100.0 96.7
vp9_inv_dct_dct_8x8_sub1_add_10_neon: 111.0 119.7
vp9_inv_dct_dct_8x8_sub8_add_10_neon: 618.0 494.7
vp9_inv_dct_dct_16x16_sub1_add_10_neon: 295.1 284.6
vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2303.2 1883.9
vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2984.8 2189.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3890.0 2799.4
vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1044.4 1012.7
vp9_inv_dct_dct_32x32_sub2_add_10_neon: 13333.7 9695.1
vp9_inv_dct_dct_32x32_sub16_add_10_neon: 18531.3 12459.8
vp9_inv_dct_dct_32x32_sub32_add_10_neon: 24470.7 16160.2
vp9_inv_wht_wht_4x4_sub4_add_10_neon: 83.0 79.7
The larger transforms are significantly faster than the corresponding
ARM versions.
The speedup vs C code is smaller than in 32 bit mode, probably
because the 64 bit intermediates in the C code can be expressed
more efficiently in aarch64.
Signed-off-by: Martin Storsjö <martin@martin.st>