[Csnd] determining FFT size
| Date | 2011-11-07 19:52 |
| From | Dennis Raddle |
| Subject | [Csnd] determining FFT size |
| In using the opcode pvsanal, what's a method to determine the FFT size, if I know the fundamental of the note? Should it be such that sr/fftsize is approximate equal to the fundamental? |
| Date | 2011-11-07 23:43 |
| From | Peiman Khosravi |
| Subject | Re: [Csnd] determining FFT size |
Hello, I believe that FFT size should be larger than the length of a period of the fundamental. In practice it should be 2 or 4 times longer for the best result (as long as you're not too bothered about loosing time resolution). So for a fundamental of 100 you'd need sr/100 * 2 (or 4) and the nearest power of two. P From: Dennis Raddle <dennis.raddle@gmail.com> Reply-To: <csound@lists.bath.ac.uk> Date: Mon, 7 Nov 2011 11:52:51 -0800 To: <csound@lists.bath.ac.uk> Subject: [Csnd] determining FFT size |
| Date | 2011-11-08 00:10 |
| From | Dennis Raddle |
| Subject | Re: [Csnd] determining FFT size |
| Thanks. For pvsanal, the FFT size doesn't need to be a power of two-- just even. According to the docs. I think technically it's not an FFT if it's not a power of two, but a DFT.
On Mon, Nov 7, 2011 at 3:43 PM, Peiman Khosravi <peimankhosravi@gmail.com> wrote:
|
| Date | 2011-11-08 00:12 |
| From | Peiman Khosravi |
| Subject | Re: [Csnd] determining FFT size |
Yes sorry you're right. I was referring to my documentations of some older opcodes in the Csound book. P From: Dennis Raddle <dennis.raddle@gmail.com> Reply-To: <csound@lists.bath.ac.uk> Date: Mon, 7 Nov 2011 16:10:03 -0800 To: <csound@lists.bath.ac.uk> Subject: Re: [Csnd] determining FFT size
On Mon, Nov 7, 2011 at 3:43 PM, Peiman Khosravi <peimankhosravi@gmail.com> wrote:
|
| Date | 2011-11-08 09:45 |
| From | Richard Dobson |
| Subject | Re: [Csnd] determining FFT size |
On 08/11/2011 00:10, Dennis Raddle wrote:
> Thanks. For pvsanal, the FFT size doesn't need to be a power of two--
> just even. According to the docs. I think technically it's not an FFT if
> it's not a power of two, but a DFT.
>
The FFT is 'simply' a fast way of computing the DFT, so all FFTs are
also DFTs, including those of other even sizes. While the power of two
size is generally the fastest/most efficient (IIRC some further
advantages accrue to power-of-four sizes), and the easiest to implement,
many other sizes which are highly composite (small prime factors) can be
almost as efficient (remaining of the order of N Log N). In
general-purpose (content-agnostic) situations, there is no obvious
reason not to choose the most effective power-of-two size, while
choosing other sizes may have application in special situations, such as
a known fundamental frequency.
However, unless the signal really is exact on that fundamental ~and~
stable (i.e. fits the internal FFT sinusoidal basis functions), such
that you can consider using a rectangular window, there will still be
some degree of spectral leakage and all the other usual artifacts which
fuzzy up the desired clarity of the analysis. They may nevertheless be
relatively less than when using an arbitrary power of two size, which is
why the option is provided in SNDAN, and why on some occasions it may be
useful to use a "tuned" FFT size in pvsanal.
The 'gotcha' in most cases is the startup transient of a sound, which
may often bear very little relationship to the fundamental that (if
ever) eventually appears. FFT sizes are therefore chosen not only simply
to catch a known fundamental frequency, but also to capture enough of
the (possibly broadband) transient to work with.
The FFTW site has some useful material and references regarding the
design of FFT algorithms:
http://www.fftw.org
Richard Dobson
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-08 10:35 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
Thanks for this explanation Richard. Best, Peiman On 8 November 2011 09:45, Richard Dobson |
| Date | 2011-11-08 13:43 |
| From | Andres Cabrera |
| Subject | Re: [Csnd] determining FFT size |
Hi, If you want the speed of the fft but a smaller window size (e.g. for better time resolution), you can set ifftsize to a power of two and iwinsize to a smaller value. The rest of the window will be zero padded and the effect on the frequency domain points will be the equivalent of interpolation. Notice that even though the fftsize is larger, you will not really improve the frequency resolution as that is determined by the window size. Cheers, Andrés On Tue, Nov 8, 2011 at 10:35 AM, peiman khosravi |
| Date | 2011-11-08 15:10 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
Hi Andrés, thanks for this. Could you explain what is the benefit of setting the window size to a smaller value if a larger fftsize doesn't produce better frequency resolution? Thanks Peiman On 8 November 2011 13:43, Andres Cabrera |
| Date | 2011-11-10 08:46 |
| From | Andres Cabrera |
| Subject | Re: [Csnd] determining FFT size |
Hi, Peiman, The number of points in the output spectrum increases, so you have something equivalent to interpolation, which can help locate peaks in the spectrum better (e.g. when peaks fall between two spectrum bins. Cheers, Andres On Tue, Nov 8, 2011 at 3:10 PM, peiman khosravi |
| Date | 2011-11-10 10:34 |
| From | luis jure |
| Subject | Re: [Csnd] determining FFT size |
on 2011-11-08 at 13:43 Andres Cabrera wrote:
>If you want the speed of the fft but a smaller window size (e.g. for
>better time resolution), you can set ifftsize to a power of two and
>iwinsize to a smaller value.
BTW, the use of these terms (fft and window size) in the documentation
is confusing. for example, it says that the window size "must be at least
ifftsize, and can usefully be larger", which makes you think that the
terms are reversed. but the context (e. g. the use of the term
"resolution") doesn't help to clear things up.
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-11 11:37 |
| From | Richard Dobson |
| Subject | Re: [Csnd] determining FFT size |
On 10/11/2011 10:34, luis jure wrote:
>
> on 2011-11-08 at 13:43 Andres Cabrera wrote:
>
>> If you want the speed of the fft but a smaller window size (e.g. for
>> better time resolution), you can set ifftsize to a power of two and
>> iwinsize to a smaller value.
>
> BTW, the use of these terms (fft and window size) in the documentation
> is confusing. for example, it says that the window size "must be at least
> ifftsize, and can usefully be larger", which makes you think that the
> terms are reversed. but the context (e. g. the use of the term
> "resolution") doesn't help to clear things up.
>
This is inherited from the original Mark Dolson pvoc from the CARL
distribution, on which the code is closely based, and as used in my
standalone version "pvocex" on the Bath Uni website**, a direct port of
the original except for the analysis file format.
CARL pvoc has two primary flags, -N for FFT size and -M for window size
(hence iwinsize in pvsanal). These can either be specified directly, or
indirectly via a -W flag for one of four "filter overlap factors". The
default is that M = N*2, corresponding in pvsanal to fftsize = 1024,
winsize = 2048. One of the options uses M = N/2. It is such a long time
since I analysed the original code (not least because the default option
generally works so well), but I assume that in each case one or other
combination of zero-padding is used. The issues are a combination of cpu
cost, fidelity and latency, and having virtually independent control of
both fft size and window size enables you to place yourself as precisely
as possible in that space.
Richard Dobson
**see http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html
NB this page and the provided binaries etc, are >10years old now, and
yes I know it's overdue for an update...
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-16 11:44 |
| From | luis jure |
| Subject | Re: [Csnd] determining FFT size |
thanks richard for your answer, sorry to return to this after many days
(other affairs were in my way).
on 2011-11-11 at 11:37 Richard Dobson wrote:
>CARL pvoc has two primary flags, -N for FFT size and -M for window size
>(hence iwinsize in pvsanal). These can either be specified directly, or
>indirectly via a -W flag for one of four "filter overlap factors". The
>default is that M = N*2, corresponding in pvsanal to fftsize = 1024,
>winsize = 2048.
this is the part that doesn't make sense to me. IANAE (i am not an
engineer), but after many efforts in trying to understand the basics of
DSP, my idea is that the "window" is the portion of the sound file you're
are going to analyse with the DFT, and since it's typically *not* a
rectangular window, you multiply it by a smoothing windowing function
(hence the term). after that it's usual to pad with zeros in order to
perform the DFT with a *bigger* size, and thus obtain a better resolution
by interpolation of the spectrum.
please excuse me if i'm missing something silly, but i really don't
understand the idea of performing a DFT *smaller* than the window size. is
there no windowing function for the DFT? and what would be the sense of it
anyway? i don't know if i'm making myself clear...
best,
lj
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-17 08:50 |
| From | Richard Dobson |
| Subject | Re: [Csnd] determining FFT size |
On 16/11/2011 11:44, luis jure wrote:
>
> thanks richard for your answer, sorry to return to this after many days
> (other affairs were in my way).
>
> on 2011-11-11 at 11:37 Richard Dobson wrote:
>
>> CARL pvoc has two primary flags, -N for FFT size and -M for window size
>> (hence iwinsize in pvsanal). These can either be specified directly, or
>> indirectly via a -W flag for one of four "filter overlap factors". The
>> default is that M = N*2, corresponding in pvsanal to fftsize = 1024,
>> winsize = 2048.
>
> this is the part that doesn't make sense to me. IANAE (i am not an
> engineer), but after many efforts in trying to understand the basics of
> DSP, my idea is that the "window" is the portion of the sound file you're
> are going to analyse with the DFT, and since it's typically *not* a
> rectangular window, you multiply it by a smoothing windowing function
> (hence the term). after that it's usual to pad with zeros in order to
> perform the DFT with a *bigger* size, and thus obtain a better resolution
> by interpolation of the spectrum.
>
This is an extract from the original comments (I assume by Dolson
himself from the CARL days) in the pvoc code, remembering N = FFT size,
M = window size, W = "filter overlap factor" where the available
relationships are:
W M
0 N*4
1 N*2 (default)
2 N
3 N/2
[analysis window]
"
The window is assumed to be symmetric with M total points. After the
initial memory allocation, analWindow always points to the midpoint of
the window (or one half sample to the right, if M is even); analWinLen
is half the true window length (rounded down). Any low pass window will
work; a Hamming window is generally fine, but a Kaiser is also
available. If the window duration is longer than the transform (M > N),
then the window is multiplied by a sin(x)/x function to meet the
condition: analWindow[Ni] = 0 for i != 0.
"
[synthesis window]
"
For the minimal mean-square-error formulation (valid for N >= M), the
synthesis window is identical to the analysis window (except for a
scale factor), and both are even in length. If N < M, then an
interpolating synthesis window is used. */
"
That is, the same sinc function is applied to the synthesis window
(Hamming, Hann, Kaiser, etc) in the case M > N, and is here called the
"interpolating window".
Now, my maths/dsp chops are too low to explain this technically, but I
have generally assumed that this extra sinc filter stage, which I have
not found in other pvocs, plays at least in part the role of a
symmetrical zero-padding, and is what makes CARL pvoc somewhat better in
audio terms than more conventional vanilla FFT windowing. The practical
benefit (easily demonstrated in the better sound when doing, say, pitch
shifting) is indeed that when M=N*2, say, you get the interpolation
benefit of the longer window filter (M), but the lower computation cost
of N. The cost issue is perhaps not so relevant these days, but on the
Atari ST with software floating point, where it took an hour to process
a second of audio, it really mattered. I suppose I should construct some
gnuplot plots to show what all this looks like and post them somewhere,
for each filter factor W. I will consider that on my todo list, but
can't promise how soon I will get around to it. If anyone wants to take
that task on, they are more than welcome!
Richard Dobson
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-18 11:39 |
| From | luis jure |
| Subject | Re: [Csnd] determining FFT size |
on 2011-11-17 at 08:50 Richard Dobson wrote:
>"The window is assumed to be symmetric with M total points. After the
>initial memory allocation, analWindow always points to the midpoint of
>the window (or one half sample to the right, if M is even); analWinLen
>is half the true window length (rounded down). Any low pass window will
>work; a Hamming window is generally fine, but a Kaiser is also
>available. If the window duration is longer than the transform (M > N),
>then the window is multiplied by a sin(x)/x function to meet the
>condition: analWindow[Ni] = 0 for i != 0."
i see... things are more clear now, although i can't say i fully
understand the rationale behind the technique. definitely a twist compared
with the "plain" phase vocoder techniques i was more or less familiar with.
thanks for the clarifications, richard!
lj
(perhaps a summarized version of this information could make its way into
the manual?)
Send bugs reports to the Sourceforge bug tracker
https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
|
| Date | 2011-11-18 17:08 |
| From | Tito Latini |
| Subject | Re: [Csnd] determining FFT size |
| Attachments | None |
| Date | 2011-11-18 18:14 |
| From | "Dr. Richard Boulanger" |
| Subject | Re: [Csnd] determining FFT size |
More of Richard Dobson's insights on FFT and PVOC and Convolution in the manual would be a real plus... It would be great if the Manual itself included a bit more "teaching". Always grateful for Richard Dobson's posts! Dr.B. Sent from my iPad. On Nov 18, 2011, at 6:39 AM, luis jure |
| Date | 2011-11-18 21:12 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
+ 1 On 18 November 2011 18:14, Dr. Richard Boulanger |
| Date | 2011-11-18 22:03 |
| From | Rory Walsh |
| Subject | Re: [Csnd] determining FFT size |
On Friday, 18 November 2011, peiman khosravi <peimankhosravi@gmail.com> wrote: > + 1 > > On 18 November 2011 18:14, Dr. Richard Boulanger <rboulanger@berklee.edu> wrote: >> More of Richard Dobson's insights on FFT and PVOC and Convolution in the manual would be a real plus... It would be great if the Manual itself included a bit more "teaching". >> >> Always grateful for Richard Dobson's posts! >> >> Dr.B. >> >> Sent from my iPad. >> >> On Nov 18, 2011, at 6:39 AM, luis jure <ljc@internet.com.uy> wrote: >> >>> >>> on 2011-11-17 at 08:50 Richard Dobson wrote: >>> >>>> "The window is assumed to be symmetric with M total points. After the >>>> initial memory allocation, analWindow always points to the midpoint of >>>> the window (or one half sample to the right, if M is even); analWinLen >>>> is half the true window length (rounded down). Any low pass window will >>>> work; a Hamming window is generally fine, but a Kaiser is also >>>> available. If the window duration is longer than the transform (M > N), >>>> then the window is multiplied by a sin(x)/x function to meet the >>>> condition: analWindow[Ni] = 0 for i != 0." >>> >>> >>> i see... things are more clear now, although i can't say i fully >>> understand the rationale behind the technique. definitely a twist compared >>> with the "plain" phase vocoder techniques i was more or less familiar with. >>> >>> thanks for the clarifications, richard! >>> >>> lj >>> >>> >>> (perhaps a summarized version of this information could make its way into >>> the manual?) >>> >>> >>> >>> Send bugs reports to the Sourceforge bug tracker >>> https://sourceforge.net/tracker/?group_id=81968&atid=564599 >>> Discussions of bugs and features can be posted here >>> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" >>> >> >> >> Send bugs reports to the Sourceforge bug tracker >> https://sourceforge.net/tracker/?group_id=81968&atid=564599 >> Discussions of bugs and features can be posted here >> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" >> >> > > > Send bugs reports to the Sourceforge bug tracker > https://sourceforge.net/tracker/?group_id=81968&atid=564599 > Discussions of bugs and features can be posted here > To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" > > |
| Date | 2013-02-12 10:56 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
| Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. I am not sure what is meant by interpolation in this context. And the number of point. I'm assuming that doesn't refer to the number of bins. Cheers, Peiman
On 10 November 2011 08:46, Andres Cabrera <mantaraya36@gmail.com> wrote: Hi, Peiman, |
| Date | 2013-02-12 11:18 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
| So I think I get it. Is this correct? So if you set the FFTsize to 2048 and window size to 4096, this means that your final analysis will have a frequency resolution of sr/4096 with FFTsize number of bins. So the additional bins which cannot be accommodated due to the FFTsize are just discarded. This is not a problem because we only use half (+1) of the bins anyway to avoid aliasing. Is this correct, in layman terms? P On 12 February 2013 10:56, peiman khosravi <peimankhosravi@gmail.com> wrote: Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. |
| Date | 2013-02-12 18:52 |
| From | Andres Cabrera |
| Subject | Re: [Csnd] determining FFT size |
Hi, Interpolation means increasing the number of points in the spectrum by approximating them from the neighbors. Increasing the number of points in the fft ( with zero padding ) is equivalent to up sampling the signal to have more points. Cheers, On Feb 12, 2013 2:56 AM, "peiman khosravi" <peimankhosravi@gmail.com> wrote:
Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. |
| Date | 2013-02-13 08:59 |
| From | peiman khosravi |
| Subject | Re: [Csnd] determining FFT size |
| Thanks Andres, Much appreciated. Best, Peiman
On 12 February 2013 18:52, Andres Cabrera <mantaraya36@gmail.com> wrote:
|