I've been doing some more research on dither - the Wikipedia article
is actually very good - I'm now completely convinced that dither need not be applied when simply adding 8-bits of zeros to 16-bit audio, because there is no re-quantization taking place
However, it's less clear to me whether or not dither is required in general
when upsampling. When upsampling, but remaining at a 16-bit word size, the samples are being modified and, as such, re-quantized; therefore dither must be applied, and indeed, sox automatically adds dither with the following command-line:
sox 16_44_input.FLAC -b16 -C0 16_88_output.FLAC rate -vMab 90.7 88200
But what about upsampling whilst simultaneously increasing word length to 24 bits
. Sox does *not* apply dither automatically with this command-line*:
sox 16_44_input.FLAC -b24 -C0 24_88_output.FLAC rate -vMab 90.7 88200
I've been forcing high pass dither in this scenario:
sox 16_44_input.FLAC -b24 -C0 24_88_output.FLAC rate -vMab 90.7 88200 dither -S
Now, I'm sure this does no harm with a 24-bit signal where you're never going to hear the bottom few LSBs, but is what I'm doing unnecessary? I'm beginning to think it might be, but I can't quite put the logical argument in place. Is it that the extra 8 bits provide the extra accuracy to describe the re-quantization?
On a slightly related subject, I'm trying to figure out whether or not resolution enhancement is relevant when upsampling and increasing word size and, if so, how to apply it to the 16-bit signal prior to upsampling. I'm not convinced that it is or it isn't required, but given that the 24-bit samples will utilize all 24 bits after upsampling, I'm leaning on the side of "not necessary". As a corollary, is resolution enhancement a "relic" of the pre-upsampling world?
If one did want to perform resolution enhancement, I believe that the Wadia document provides all the pointers that are needed, though when it says that Wadia adds a 9-bit signal to the 16-bit audio, whilst correct, it's a little misleading if you don't have your wits about you. Bear in mind that 16-bit audio is +/- 15-bits with the 16-bit effectively being the sign bit. What Wadia is doing is adding a 9-bit *signed* signal - i.e. +/- 8-bits - to the original 16-bit audio. By adding (or subtracting) *8* bits into the "empty" 8-bits in the new 24-bit word, it will never overflow into the LSB of the original 16-bit word.
The final thing about resolution enhancement that I need to understand is how the (TPDF) amplitude distribution and the frequency distribution of the resolution enhancement combine to create the resolution enhancing signal.
Any thoughts greatly welcomed!
* To thicken the plot, if you specify "dither" as a DSP effect explicitly, the -a option invokes an algorithm which attempts to work out if dither is required or not. In this case, dither *is* applied automatically for both these command-lines:
sox 16_44_input.FLAC -b16 -C0 16_88_output.FLAC rate -vMab 90.7 88200 dither -a
sox 16_44_input.FLAC -b24 -C0 24_88_output.FLAC rate -vMab 90.7 88200 dither -a
The sox man page does suggest that this algorithm isn't perfect though.