This can be done by reusing the destination array to store
a temporary LUT (with only half the amount of elements, but
double the element size). This relies on certain assumptions
about sizes, but they are always fulfilled for systems supported
by us (sizeof(double) == 8 is needed/guaranteed since commit
3383a53e7d). Furthermore,
sizeof(uint32_t) is always >= four because CHAR_BIT is eight
(in fact, sizeof(uint32_t) is four, because the exact width
types don't have any padding).
Reviewed-by: Zhao Zhili <quinkblack@foxmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>