| Uğur Güney wrote:
..
>
> # Why there are 513 bins, not 512? Is bin number 0 for DC component or
> fundamental component? Is Nth_Bin for the frequency
> (SampleRate/1024)*(N-1)?
> # So should my function table, which will contain the transfer
> function of the filter, has 512 or 513 points?
>
513. The bottom and top bins cover DC and Nyquist. The exact
mathematical details are rather complex (pun intended); simplistically
it is much like saying that in counting from 0 to 10 there are 11
numbers. A more mathematical way of looking at it is that a purely
"real" signal expressed in the form of a complex (real+imaginary)
spectrum is exactly symmetrical around Nyquist, so the upper values are
redundant; but we keep DC and Nyquist as per the 0-10 idea.
> # In manual it says also:
>
> Currently only one format is implemented by this opcode: 0 = amplitude
> + frequency
>
> # What does this exactly mean? I understand something like this:
> # fsig is an array. Its elements are pair of numbers, one for
> amplitude and one for frequency.
Other formats are possible, such as amplitude + phase (and even a raw
complex real+imag form), but I never got around to implementing the
other forms in the opcodes. The provision is there though for when I or
someone gets around to it (I haven't even looked at the code for ages,
pehaps someone has already done it!). The expensive step is the
conversion to amp/phase; moving that to amp/freq is a simple arithmetic
step. So using amp/phase would not noticeably save processing time. The
associated PVOCEX file format ~does~ support all three formats though.
Because pvsanal does not take FFT but
> makes a phase-vocoder analysis,
Taking the FFT (of a windowed block of samples) is the first stage in
making a phase vocoder analysis. So the FFT usage is pretty heavy! In
turn, the phase vocoder is the first step in other techniques such as
partial tracking.
frequency values are not exactly
> integer multiples of some fundamental, SR/1024, but they deviate from
> their corresponding bin value. How are these deviations stored?
This involves the "phase" part of the phase vocoder. Each bin has a
nominal fixed centre frequency of sr/N Hz * bin number (so frequencies
in the lowest bins can even be negative - the DC bin might range between
+- 43 Hz, for example), but a relatively limited bandwidth dependent on
the amount of overlap between frames.
A common alternative view is to see each bin as a very simple bandpass
filter where the filters all overlap somewhat. Frequency is defined as
"the rate of change of phase", and thus the differences in phase between
successive windows (phase in turn is obtained from the raw
real/imaginary values emerging from the FFT, using good old Pythagoras's
theorem) can get converted into a true frequency value. But note the
plain phase vocoder cannot in itself track moving frequency components;
at some point the information moves into higher or lower bins - a bit
like the image of a football moving between multiple TV screens in a
mega-display, where the images overlap a bit but the cameras are fixed.
In the limit of single-sample overlap (the "Sliding Phase Vocoder" or
SPV which has recently been incorporated), the bandwidth of each bin is
DC to Nyquist, such that those filters now do not overlap but fully
stack on top of each other!
We can undertsand this intuitively by considering what we might be able
to deduce about frequency changes between widely-spaced frames. In this
case, rapid deviations are simply missed, such that the content of each
bin is more like a crude average of the start and end values from the
FFT. We will be missing important information (transients, moving
pitches generally), and the effective bandwidth of each bin becomes very
narrow as little deviation is measurable. Conversely, with maximum
possible overlap, we track frequency changes at single sample
resolution, over the whole range. How accurate the result is (in terms
of frequency resolution within a frame) is still dependent on the size
of the window (fftsize).
[To continue the tv screen analogy: widely-spaced frames are like having
one tv showing David Beckham kicking the football from the centre of the
field; the next tv shows the ball entering the goal. We ~assume~ it is
the ball Beckham kicked, and may even extrapolate the path it took; but
we can't be abolutely sure someone else didn't kick it in between, or
even replace it with another one. Or use two balls!]
There is no escaping the maths jargon when it comes to explaining how
the phase vocoder works, sorry! Or, just accept what is contained in a
frame, call it magic, and just use it.
The SPV has some truly weird aspects which still need further
investigation. We can no longer make any assumptions at all about what
frequency might be in what bin - a high bin might well contain a very
low frequency, especially if the source is something simple with very
few components.
See http://dream.cs.bath.ac.uk/SDFT/index.html for much more information
and wacky sound examples.
Are
> the freq. values absolute freqencies in Hz, or are they deviations
> (delta f) from the frequencies corresponding each bin, or something
> like these?
See above - true frequencies in Hz (albeit sampled, as this is a sampled
system - we might have to calculate some amplitude interpolation
through adjacent bins to find the "true" frequency of a source partial
at that position); representing delta ~phase~ between frames.
Richard Dobson
|