There's very little performance difference vs SSE2/SSSE3 and most systems will use the AVX2 implementations anyway. This reduces code size and compilation time by a significant amount.