Csound Csound-dev Csound-tekno Search About

[Csnd] [Fwd: Fwd: Audio recording bitdepth]

Date2009-12-10 16:32
FromFelipe Sateler
Subject[Csnd] [Fwd: Fwd: Audio recording bitdepth]
This is forwarded from the JACK mailing list, and was in turn forwarded
from the CoreAudio list. It references this post by Bjorn Roche
http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html

sndfile uses 0x8000 to convert to-from float and int. Are there other
places in csound where this is done? Also, I note that the standard math
functions sin/cos/etc are used in csound. Is there reason to worry about
using them?

-------- Forwarded Message --------
From: Paul Davis 
To: JACK Developers 
Subject: Fwd: Audio recording bitdepth
Date: Wed, 9 Dec 2009 21:50:51 -0500
Newsgroups: gmane.comp.audio.jackit

somebody else who knows what he is talking about ...


---------- Forwarded message ----------
From: Brian Willoughby 
Date: Wed, Dec 9, 2009 at 9:02 PM
Subject: Re: Audio recording bitdepth
To: CoreAudio API 
Cc: Ross Bencina , Bjorn Roche
, Paul Davis 


The problem with this whole thread is that there is no downgrade in
fidelity with the conversion method used by CoreAudio.  All the rest
of the comments assume that there is a superior method when there
really isn't one.


Bjorn proposes in his blog that there are two good choices for
conversion methods.  I'll call them A and B.  Method A is used by
Apple in CoreAudio.  Method B is the 'asymmetrical' option.  Bjorn
claims that they are both good, with each method having specific
benefits and drawbacks.  The problem is that Bjorn's hypothesis has
not been peer-reviewed, and does not stand up to basic mathematical
principles.  Bjorn's own tests do not reveal the flaws in method B
because his tests are incomplete and do not have a solid basis.

In a nutshell, Bjorn's asymmetrical conversion introduces non-linear
distortion by processing positive values differently than negative
values.  Ross' comments about CPU efficiency are a diversion from the
fact that all processing on the distorted waveforms would make this
distortion irreversible.  Bjorn's tests only happen to reverse this
non-linear distortion for the one special case where no processing is
done on the audio, which is clearly not an option for someone using
Logic, or even for someone combining music and system sounds on the
same interface.  Thus, asymmetrical conversion would not work for most
application, and since you can't use different conversions for
different applications you much use the CoreAudio conversion (or
equivalent).  That's the trouble with designing your own tests,
because your assumptions are masked by the implementation of your
tests.

Method B, the asymmetrical conversion, has no advantage - neither
hypothetical nor actual.  The only valid conversion method is the one
used by Apple in CoreAudio.  Actually, there is a larger set of valid
conversions: Basically, any conversion factor which is a pure power of
two, and is applied identically to all input values, is valid.  Apple
has chosen +/-1 for the normalization in the float format, which is a
very common choice.  Other ranges would be equally valid so long as
every process involved is aware of the standard.

The only reason I'm taking the trouble to point this out on the
CoreAudio mailing list is that I would hate to see this debate raised
again.  Bjorn's blog puts unsubstantiated misinformation out onto the
web which is going to unnecessarily raise questions about why Apple
chose option A, why they didn't choose the hypothetically
higher-fidelity option (false dilemma), and why they don't offer user
and/or programmer configuration options for different conversion
factors.  The fact is that there are solid mathematical reasons for
Apple's choices, and there really are no tradeoffs or lost fidelity as
a result.  Suggestions to the contrary will not survive peer review.


For anyone interested in the other flaws in Bjorn's blog entries:

* Bjorn claims that when A/D converters clip around -.5 dBFS, that
it's equivalent to (2^n)-.5, which is completely false.  This clipping
happens entirely in the analog domain, before quantization to digital
codes, so it is not equivalent to (2^n)-.5 because the converter is
still based on 2^n.  What happens before the A/D conversion cannot be
precisely equated to binary math.  These comments show a lack of
understanding of the A/D process as well as mathematics.

* Bjorn claims that +1 occurs in the real world, but that's not true.
The only real world is the analog world, and no A/D converter allows
the +1 value.  In the virtual world of VST synths, +1 is certainly
possible, but only a problem for developers who try to get closer to
the 24-bit maximum than the 16-bit maximum.  In contrast, hardware DSP
chips have embedded sine wave ROM tables which e.g. only span +/-
32766.  No attempt is made to reach +32767, and certainly not -32768,
because 2 LSBs of headroom is immaterial.  32766 is only 0.00053 dB
below full scale, and nobody really cares to risk clipping for such a
miniscule gain in signal level.  A 24-bit variant would just
synthesize waveforms without getting so close to clipping.  24-bit
codes could be a tiny fraction louder than 16-bit codes, but not
enough to warrant the risk of clipping with 16-bit audio interfaces.
In other words, Bjorn is actually looking at a real issue worthy of
discussion, but the suggested solution is entirely wrong.

* Bjorn synthesizes sine waves and then tests distortion outputs, all
without specifying the source for the sine data.  The standard C math
library sin() and cos() functions use linear interpolation to produce
the values, and so Bjorn's original data has distortion from the
start.  In other words, those are not pure sine waves!  Thus, the
tests that are cited are nothing more than fun and pretty pictures,
and they are certainly not mathematical proofs of the bad assumptions
made about asymmetrical conversions.

Brian Willoughby
Sound Consulting


On Dec 9, 2009, at 17:00, Ross Bencina wrote:
>>
>> I am a bit confused by your comments, though: I think in this case you aren't going to deal with cache misses. The likely performance issue  is branch prediction failures.
>
> And clipping should be able to be handled with conditional moves, which shouldn't (in theory) create instruction pipeline issues. But in terms of performance you are doubling the number of tests if there are two different thresholds. Anyway, without seeing a profile this is moot.
>
> In a world where people spend more on ADC/DAC hardware than on their computer I can't imagine they're going to be keen to downgrade fidelity (even if theoretical, cf other discussions here) for the benefit of a few extra CPU cycles.


-- 
Saludos,
Felipe Sateler

Date2009-12-10 17:05
FromVictor Lazzarini
Subject[Csnd] Re: [Fwd: Fwd: Audio recording bitdepth]
As far as sin/cos are concerned, I don't think there are any practical  
problems. The quality of the output of these should be fine for our  
uses.

Victor


On 10 Dec 2009, at 16:32, Felipe Sateler wrote:

> This is forwarded from the JACK mailing list, and was in turn  
> forwarded
> from the CoreAudio list. It references this post by Bjorn Roche
> http://blog.bjornroche.com/2009/12/linearity-and-dynamic-range-in-int.html
>
> sndfile uses 0x8000 to convert to-from float and int. Are there other
> places in csound where this is done? Also, I note that the standard  
> math
> functions sin/cos/etc are used in csound. Is there reason to worry  
> about
> using them?
>
> -------- Forwarded Message --------
> From: Paul Davis 
> To: JACK Developers 
> Subject: Fwd: Audio recording bitdepth
> Date: Wed, 9 Dec 2009 21:50:51 -0500
> Newsgroups: gmane.comp.audio.jackit
>
> somebody else who knows what he is talking about ...
>
>
> ---------- Forwarded message ----------
> From: Brian Willoughby 
> Date: Wed, Dec 9, 2009 at 9:02 PM
> Subject: Re: Audio recording bitdepth
> To: CoreAudio API 
> Cc: Ross Bencina , Bjorn Roche
> , Paul Davis 
>
>
> The problem with this whole thread is that there is no downgrade in
> fidelity with the conversion method used by CoreAudio.  All the rest
> of the comments assume that there is a superior method when there
> really isn't one.
>
>
> Bjorn proposes in his blog that there are two good choices for
> conversion methods.  I'll call them A and B.  Method A is used by
> Apple in CoreAudio.  Method B is the 'asymmetrical' option.  Bjorn
> claims that they are both good, with each method having specific
> benefits and drawbacks.  The problem is that Bjorn's hypothesis has
> not been peer-reviewed, and does not stand up to basic mathematical
> principles.  Bjorn's own tests do not reveal the flaws in method B
> because his tests are incomplete and do not have a solid basis.
>
> In a nutshell, Bjorn's asymmetrical conversion introduces non-linear
> distortion by processing positive values differently than negative
> values.  Ross' comments about CPU efficiency are a diversion from the
> fact that all processing on the distorted waveforms would make this
> distortion irreversible.  Bjorn's tests only happen to reverse this
> non-linear distortion for the one special case where no processing is
> done on the audio, which is clearly not an option for someone using
> Logic, or even for someone combining music and system sounds on the
> same interface.  Thus, asymmetrical conversion would not work for most
> application, and since you can't use different conversions for
> different applications you much use the CoreAudio conversion (or
> equivalent).  That's the trouble with designing your own tests,
> because your assumptions are masked by the implementation of your
> tests.
>
> Method B, the asymmetrical conversion, has no advantage - neither
> hypothetical nor actual.  The only valid conversion method is the one
> used by Apple in CoreAudio.  Actually, there is a larger set of valid
> conversions: Basically, any conversion factor which is a pure power of
> two, and is applied identically to all input values, is valid.  Apple
> has chosen +/-1 for the normalization in the float format, which is a
> very common choice.  Other ranges would be equally valid so long as
> every process involved is aware of the standard.
>
> The only reason I'm taking the trouble to point this out on the
> CoreAudio mailing list is that I would hate to see this debate raised
> again.  Bjorn's blog puts unsubstantiated misinformation out onto the
> web which is going to unnecessarily raise questions about why Apple
> chose option A, why they didn't choose the hypothetically
> higher-fidelity option (false dilemma), and why they don't offer user
> and/or programmer configuration options for different conversion
> factors.  The fact is that there are solid mathematical reasons for
> Apple's choices, and there really are no tradeoffs or lost fidelity as
> a result.  Suggestions to the contrary will not survive peer review.
>
>
> For anyone interested in the other flaws in Bjorn's blog entries:
>
> * Bjorn claims that when A/D converters clip around -.5 dBFS, that
> it's equivalent to (2^n)-.5, which is completely false.  This clipping
> happens entirely in the analog domain, before quantization to digital
> codes, so it is not equivalent to (2^n)-.5 because the converter is
> still based on 2^n.  What happens before the A/D conversion cannot be
> precisely equated to binary math.  These comments show a lack of
> understanding of the A/D process as well as mathematics.
>
> * Bjorn claims that +1 occurs in the real world, but that's not true.
> The only real world is the analog world, and no A/D converter allows
> the +1 value.  In the virtual world of VST synths, +1 is certainly
> possible, but only a problem for developers who try to get closer to
> the 24-bit maximum than the 16-bit maximum.  In contrast, hardware DSP
> chips have embedded sine wave ROM tables which e.g. only span +/-
> 32766.  No attempt is made to reach +32767, and certainly not -32768,
> because 2 LSBs of headroom is immaterial.  32766 is only 0.00053 dB
> below full scale, and nobody really cares to risk clipping for such a
> miniscule gain in signal level.  A 24-bit variant would just
> synthesize waveforms without getting so close to clipping.  24-bit
> codes could be a tiny fraction louder than 16-bit codes, but not
> enough to warrant the risk of clipping with 16-bit audio interfaces.
> In other words, Bjorn is actually looking at a real issue worthy of
> discussion, but the suggested solution is entirely wrong.
>
> * Bjorn synthesizes sine waves and then tests distortion outputs, all
> without specifying the source for the sine data.  The standard C math
> library sin() and cos() functions use linear interpolation to produce
> the values, and so Bjorn's original data has distortion from the
> start.  In other words, those are not pure sine waves!  Thus, the
> tests that are cited are nothing more than fun and pretty pictures,
> and they are certainly not mathematical proofs of the bad assumptions
> made about asymmetrical conversions.
>
> Brian Willoughby
> Sound Consulting
>
>
> On Dec 9, 2009, at 17:00, Ross Bencina wrote:
>>>
>>> I am a bit confused by your comments, though: I think in this case  
>>> you aren't going to deal with cache misses. The likely performance  
>>> issue  is branch prediction failures.
>>
>> And clipping should be able to be handled with conditional moves,  
>> which shouldn't (in theory) create instruction pipeline issues. But  
>> in terms of performance you are doubling the number of tests if  
>> there are two different thresholds. Anyway, without seeing a  
>> profile this is moot.
>>
>> In a world where people spend more on ADC/DAC hardware than on  
>> their computer I can't imagine they're going to be keen to  
>> downgrade fidelity (even if theoretical, cf other discussions here)  
>> for the benefit of a few extra CPU cycles.
>
>
> -- 
> Saludos,
> Felipe Sateler



Send bugs reports to this list.
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"