Csound Csound-dev Csound-tekno Search About

Re: Newbie question: window

Date1998-06-15 16:18
FromMike Berry
SubjectRe: Newbie question: window
The tradeoff in a FFT-based analysis system is between temporal accuracy and
frequency accuracy.  An FFT is an "analysis" of a chunk of samples in time. 
If your window is 1024 samples, then you get a single set of FFT data which
represents these 1024 samples.  You would get 512 different amplitudes and
phases of frequency bands, the bands equally spaced across the spectrum.  At
44.1kHz, this means a band every 43 Hz (22050 / 512).  However, your temporal
information is completely hidden inside the transform.  If a note were to
start in the middle of a window, you would not know exactly when it started. 
In effect, the temporal data is "quantized" (I say quantized but really it is
a kind of indeterminacy) to 23 ms in this case (1024 samples / 44.1kHz).
	So here's where the tradeoff begins.  Make the window 512 samples.  Then you
quantize to 11.6 ms, but you only get bands every 86 Hz.  2048 samples = 22 Hz
but 46 ms.
	The Phase Vocoding process assumes that the data in the FFT analysis can be
stretched or compressed temporally.  This assumption is more or less valid
depending on the original material and how it was analyzed.  As a general
rule, continuously pitched material is going to sound better if a large window
is used, because the frequencies are going to be more accurate.  Music with
many "events" will sound better with a small window.  Unfortunately, most
music has both elements, so you need to reach a compromise that suits your
purposes.  That's why the window size in not defined for you.


> I realise that is a simplistic explaination, but.....  the larger the
> window the faster the PV, but the more smearing there is as it does
> not track changes so well.  Small windows are slow, and not so
> accurate as there is insufficient data to get accurate estimates.

-- 
Mike Berry
mikeb@nmol.com
http://www.nmol.com/users/mikeb