Csound Csound-dev Csound-tekno Search About

[Csnd] info about pvsanal opcode

Date2008-01-13 20:47
FromUğur Güney
Subject[Csnd] info about pvsanal opcode
AttachmentsNone  

Date2008-01-14 11:11
FromRichard Dobson
Subject[Csnd] Re: info about pvsanal opcode
Uğur Güney wrote:
..
> 
> # Why there are 513 bins, not 512? Is bin number 0 for DC component or
> fundamental component? Is Nth_Bin for the frequency
> (SampleRate/1024)*(N-1)?
> # So should my function table, which will contain the transfer
> function of the filter, has 512 or 513 points?
>

513. The bottom and top bins cover DC and Nyquist. The exact 
mathematical details are rather complex (pun intended); simplistically 
it is much like saying that in counting from 0 to 10 there are 11 
numbers. A more mathematical way of looking at it is that a purely 
"real" signal expressed in the form of a complex (real+imaginary) 
spectrum is exactly symmetrical around Nyquist, so the upper values are 
redundant; but we keep DC and Nyquist as per the 0-10 idea.



> # In manual it says also:
> 
> Currently only one format is implemented by this opcode: 0 = amplitude
> + frequency
> 
> # What does this exactly mean? I understand something like this:
> # fsig is an array. Its elements are pair of numbers, one for
> amplitude and one for frequency. 

Other formats are possible, such as amplitude + phase (and even a raw 
complex real+imag form), but I never got around to implementing the 
other forms in the opcodes. The provision is there though for when I or 
someone gets around to it (I haven't even looked at the code for ages, 
pehaps someone has already done it!). The expensive step is the 
conversion to amp/phase; moving that to amp/freq is  a simple arithmetic 
step. So using amp/phase would not noticeably save processing time.  The 
associated PVOCEX file format ~does~ support all three formats though.


Because pvsanal does not take FFT but
> makes a phase-vocoder analysis, 


Taking the FFT  (of a windowed block of samples) is the first stage in 
making a phase vocoder analysis. So the FFT usage is pretty heavy! In 
turn, the phase vocoder is the first step in other techniques such as 
partial tracking.

frequency values are not exactly
> integer multiples of some fundamental, SR/1024, but they deviate from
> their corresponding bin value. How are these deviations stored? 

This involves the "phase" part of the phase vocoder. Each bin has a 
nominal fixed centre frequency of sr/N Hz * bin number (so frequencies 
in the lowest bins can even be negative - the DC bin might range between 
+- 43 Hz, for example), but a relatively limited bandwidth dependent on 
the amount of overlap between frames.

A common alternative view is to see each bin as a very simple bandpass 
filter where the filters all overlap somewhat. Frequency is defined as 
"the rate of change of phase", and thus the differences in phase between 
successive windows (phase in turn is obtained from the raw 
real/imaginary values emerging from the FFT, using good old Pythagoras's 
theorem) can get converted into a true frequency value. But note the 
plain phase vocoder cannot in itself track moving frequency components; 
at some point the information moves into higher or lower bins - a bit 
like the image of a football moving between multiple TV screens in a 
mega-display, where the images overlap a bit but the cameras are fixed.

In the limit of single-sample overlap (the "Sliding Phase Vocoder" or 
SPV which has recently been incorporated), the bandwidth of each bin is 
DC to Nyquist, such that those filters now do not overlap but fully 
stack on top of each other!

We can undertsand this intuitively by considering what we might be able 
to deduce about frequency changes between widely-spaced frames. In this 
case, rapid deviations are simply missed, such that the content of each 
bin is more like a crude average of the start and end values from the 
FFT. We will be missing important information (transients, moving 
pitches generally), and the effective bandwidth of each bin becomes very 
narrow as little deviation is measurable. Conversely, with maximum 
possible overlap, we track frequency changes at single sample 
resolution, over the whole range. How accurate the result is (in terms 
of frequency resolution within a frame) is still dependent on the size 
of the window (fftsize).

[To continue the tv screen analogy: widely-spaced frames are like having 
one tv showing David Beckham kicking the football from the centre of the 
field; the next tv shows the ball entering the goal. We ~assume~ it is 
the ball Beckham kicked, and may even extrapolate the path it took; but 
we can't be abolutely sure someone else didn't kick it in between, or 
even replace it with another one. Or use two balls!]


There is no escaping the maths jargon when it comes to explaining how 
the phase vocoder works, sorry! Or, just accept what is contained in a 
frame, call it magic, and just use it.


The SPV has some truly weird aspects which still need further 
investigation. We can no longer make any assumptions at all about what 
frequency might be in what bin - a high bin might well contain a very 
low frequency, especially if the source is something simple with very 
few components.
See http://dream.cs.bath.ac.uk/SDFT/index.html for much more information 
and wacky sound examples.
Are
> the freq. values absolute freqencies in Hz, or are they deviations
> (delta f) from the frequencies corresponding each bin, or something
> like these?

See above - true frequencies in Hz (albeit sampled, as this is a sampled 
system  - we might have to calculate some amplitude interpolation 
through adjacent bins to find the "true" frequency of a source partial 
at that position); representing delta ~phase~ between frames.

Richard Dobson



Date2008-01-17 19:02
FromUğur Güney
Subject[Csnd] Re: Re: info about pvsanal opcode
AttachmentsNone