[Csnd] determining FFT size
Date | 2011-11-07 19:52 |
From | Dennis Raddle |
Subject | [Csnd] determining FFT size |
In using the opcode pvsanal, what's a method to determine the FFT size, if I know the fundamental of the note? Should it be such that sr/fftsize is approximate equal to the fundamental? |
Date | 2011-11-07 23:43 |
From | Peiman Khosravi |
Subject | Re: [Csnd] determining FFT size |
Hello, I believe that FFT size should be larger than the length of a period of the fundamental. In practice it should be 2 or 4 times longer for the best result (as long as you're not too bothered about loosing time resolution). So for a fundamental of 100 you'd need sr/100 * 2 (or 4) and the nearest power of two. P From: Dennis Raddle <dennis.raddle@gmail.com> Reply-To: <csound@lists.bath.ac.uk> Date: Mon, 7 Nov 2011 11:52:51 -0800 To: <csound@lists.bath.ac.uk> Subject: [Csnd] determining FFT size |
Date | 2011-11-08 00:10 |
From | Dennis Raddle |
Subject | Re: [Csnd] determining FFT size |
Thanks. For pvsanal, the FFT size doesn't need to be a power of two-- just even. According to the docs. I think technically it's not an FFT if it's not a power of two, but a DFT.
On Mon, Nov 7, 2011 at 3:43 PM, Peiman Khosravi <peimankhosravi@gmail.com> wrote:
|
Date | 2011-11-08 00:12 |
From | Peiman Khosravi |
Subject | Re: [Csnd] determining FFT size |
Yes sorry you're right. I was referring to my documentations of some older opcodes in the Csound book. P From: Dennis Raddle <dennis.raddle@gmail.com> Reply-To: <csound@lists.bath.ac.uk> Date: Mon, 7 Nov 2011 16:10:03 -0800 To: <csound@lists.bath.ac.uk> Subject: Re: [Csnd] determining FFT size
On Mon, Nov 7, 2011 at 3:43 PM, Peiman Khosravi <peimankhosravi@gmail.com> wrote:
|
Date | 2011-11-08 09:45 |
From | Richard Dobson |
Subject | Re: [Csnd] determining FFT size |
On 08/11/2011 00:10, Dennis Raddle wrote: > Thanks. For pvsanal, the FFT size doesn't need to be a power of two-- > just even. According to the docs. I think technically it's not an FFT if > it's not a power of two, but a DFT. > The FFT is 'simply' a fast way of computing the DFT, so all FFTs are also DFTs, including those of other even sizes. While the power of two size is generally the fastest/most efficient (IIRC some further advantages accrue to power-of-four sizes), and the easiest to implement, many other sizes which are highly composite (small prime factors) can be almost as efficient (remaining of the order of N Log N). In general-purpose (content-agnostic) situations, there is no obvious reason not to choose the most effective power-of-two size, while choosing other sizes may have application in special situations, such as a known fundamental frequency. However, unless the signal really is exact on that fundamental ~and~ stable (i.e. fits the internal FFT sinusoidal basis functions), such that you can consider using a rectangular window, there will still be some degree of spectral leakage and all the other usual artifacts which fuzzy up the desired clarity of the analysis. They may nevertheless be relatively less than when using an arbitrary power of two size, which is why the option is provided in SNDAN, and why on some occasions it may be useful to use a "tuned" FFT size in pvsanal. The 'gotcha' in most cases is the startup transient of a sound, which may often bear very little relationship to the fundamental that (if ever) eventually appears. FFT sizes are therefore chosen not only simply to catch a known fundamental frequency, but also to capture enough of the (possibly broadband) transient to work with. The FFTW site has some useful material and references regarding the design of FFT algorithms: http://www.fftw.org Richard Dobson Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-08 10:35 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
Thanks for this explanation Richard. Best, Peiman On 8 November 2011 09:45, Richard Dobson |
Date | 2011-11-08 13:43 |
From | Andres Cabrera |
Subject | Re: [Csnd] determining FFT size |
Hi, If you want the speed of the fft but a smaller window size (e.g. for better time resolution), you can set ifftsize to a power of two and iwinsize to a smaller value. The rest of the window will be zero padded and the effect on the frequency domain points will be the equivalent of interpolation. Notice that even though the fftsize is larger, you will not really improve the frequency resolution as that is determined by the window size. Cheers, Andrés On Tue, Nov 8, 2011 at 10:35 AM, peiman khosravi |
Date | 2011-11-08 15:10 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
Hi Andrés, thanks for this. Could you explain what is the benefit of setting the window size to a smaller value if a larger fftsize doesn't produce better frequency resolution? Thanks Peiman On 8 November 2011 13:43, Andres Cabrera |
Date | 2011-11-10 08:46 |
From | Andres Cabrera |
Subject | Re: [Csnd] determining FFT size |
Hi, Peiman, The number of points in the output spectrum increases, so you have something equivalent to interpolation, which can help locate peaks in the spectrum better (e.g. when peaks fall between two spectrum bins. Cheers, Andres On Tue, Nov 8, 2011 at 3:10 PM, peiman khosravi |
Date | 2011-11-10 10:34 |
From | luis jure |
Subject | Re: [Csnd] determining FFT size |
on 2011-11-08 at 13:43 Andres Cabrera wrote: >If you want the speed of the fft but a smaller window size (e.g. for >better time resolution), you can set ifftsize to a power of two and >iwinsize to a smaller value. BTW, the use of these terms (fft and window size) in the documentation is confusing. for example, it says that the window size "must be at least ifftsize, and can usefully be larger", which makes you think that the terms are reversed. but the context (e. g. the use of the term "resolution") doesn't help to clear things up. Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-11 11:37 |
From | Richard Dobson |
Subject | Re: [Csnd] determining FFT size |
On 10/11/2011 10:34, luis jure wrote: > > on 2011-11-08 at 13:43 Andres Cabrera wrote: > >> If you want the speed of the fft but a smaller window size (e.g. for >> better time resolution), you can set ifftsize to a power of two and >> iwinsize to a smaller value. > > BTW, the use of these terms (fft and window size) in the documentation > is confusing. for example, it says that the window size "must be at least > ifftsize, and can usefully be larger", which makes you think that the > terms are reversed. but the context (e. g. the use of the term > "resolution") doesn't help to clear things up. > This is inherited from the original Mark Dolson pvoc from the CARL distribution, on which the code is closely based, and as used in my standalone version "pvocex" on the Bath Uni website**, a direct port of the original except for the analysis file format. CARL pvoc has two primary flags, -N for FFT size and -M for window size (hence iwinsize in pvsanal). These can either be specified directly, or indirectly via a -W flag for one of four "filter overlap factors". The default is that M = N*2, corresponding in pvsanal to fftsize = 1024, winsize = 2048. One of the options uses M = N/2. It is such a long time since I analysed the original code (not least because the default option generally works so well), but I assume that in each case one or other combination of zero-padding is used. The issues are a combination of cpu cost, fidelity and latency, and having virtually independent control of both fft size and window size enables you to place yourself as precisely as possible in that space. Richard Dobson **see http://dream.cs.bath.ac.uk/researchdev/pvocex/pvocex.html NB this page and the provided binaries etc, are >10years old now, and yes I know it's overdue for an update... Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-16 11:44 |
From | luis jure |
Subject | Re: [Csnd] determining FFT size |
thanks richard for your answer, sorry to return to this after many days (other affairs were in my way). on 2011-11-11 at 11:37 Richard Dobson wrote: >CARL pvoc has two primary flags, -N for FFT size and -M for window size >(hence iwinsize in pvsanal). These can either be specified directly, or >indirectly via a -W flag for one of four "filter overlap factors". The >default is that M = N*2, corresponding in pvsanal to fftsize = 1024, >winsize = 2048. this is the part that doesn't make sense to me. IANAE (i am not an engineer), but after many efforts in trying to understand the basics of DSP, my idea is that the "window" is the portion of the sound file you're are going to analyse with the DFT, and since it's typically *not* a rectangular window, you multiply it by a smoothing windowing function (hence the term). after that it's usual to pad with zeros in order to perform the DFT with a *bigger* size, and thus obtain a better resolution by interpolation of the spectrum. please excuse me if i'm missing something silly, but i really don't understand the idea of performing a DFT *smaller* than the window size. is there no windowing function for the DFT? and what would be the sense of it anyway? i don't know if i'm making myself clear... best, lj Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-17 08:50 |
From | Richard Dobson |
Subject | Re: [Csnd] determining FFT size |
On 16/11/2011 11:44, luis jure wrote: > > thanks richard for your answer, sorry to return to this after many days > (other affairs were in my way). > > on 2011-11-11 at 11:37 Richard Dobson wrote: > >> CARL pvoc has two primary flags, -N for FFT size and -M for window size >> (hence iwinsize in pvsanal). These can either be specified directly, or >> indirectly via a -W flag for one of four "filter overlap factors". The >> default is that M = N*2, corresponding in pvsanal to fftsize = 1024, >> winsize = 2048. > > this is the part that doesn't make sense to me. IANAE (i am not an > engineer), but after many efforts in trying to understand the basics of > DSP, my idea is that the "window" is the portion of the sound file you're > are going to analyse with the DFT, and since it's typically *not* a > rectangular window, you multiply it by a smoothing windowing function > (hence the term). after that it's usual to pad with zeros in order to > perform the DFT with a *bigger* size, and thus obtain a better resolution > by interpolation of the spectrum. > This is an extract from the original comments (I assume by Dolson himself from the CARL days) in the pvoc code, remembering N = FFT size, M = window size, W = "filter overlap factor" where the available relationships are: W M 0 N*4 1 N*2 (default) 2 N 3 N/2 [analysis window] " The window is assumed to be symmetric with M total points. After the initial memory allocation, analWindow always points to the midpoint of the window (or one half sample to the right, if M is even); analWinLen is half the true window length (rounded down). Any low pass window will work; a Hamming window is generally fine, but a Kaiser is also available. If the window duration is longer than the transform (M > N), then the window is multiplied by a sin(x)/x function to meet the condition: analWindow[Ni] = 0 for i != 0. " [synthesis window] " For the minimal mean-square-error formulation (valid for N >= M), the synthesis window is identical to the analysis window (except for a scale factor), and both are even in length. If N < M, then an interpolating synthesis window is used. */ " That is, the same sinc function is applied to the synthesis window (Hamming, Hann, Kaiser, etc) in the case M > N, and is here called the "interpolating window". Now, my maths/dsp chops are too low to explain this technically, but I have generally assumed that this extra sinc filter stage, which I have not found in other pvocs, plays at least in part the role of a symmetrical zero-padding, and is what makes CARL pvoc somewhat better in audio terms than more conventional vanilla FFT windowing. The practical benefit (easily demonstrated in the better sound when doing, say, pitch shifting) is indeed that when M=N*2, say, you get the interpolation benefit of the longer window filter (M), but the lower computation cost of N. The cost issue is perhaps not so relevant these days, but on the Atari ST with software floating point, where it took an hour to process a second of audio, it really mattered. I suppose I should construct some gnuplot plots to show what all this looks like and post them somewhere, for each filter factor W. I will consider that on my todo list, but can't promise how soon I will get around to it. If anyone wants to take that task on, they are more than welcome! Richard Dobson Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-18 11:39 |
From | luis jure |
Subject | Re: [Csnd] determining FFT size |
on 2011-11-17 at 08:50 Richard Dobson wrote: >"The window is assumed to be symmetric with M total points. After the >initial memory allocation, analWindow always points to the midpoint of >the window (or one half sample to the right, if M is even); analWinLen >is half the true window length (rounded down). Any low pass window will >work; a Hamming window is generally fine, but a Kaiser is also >available. If the window duration is longer than the transform (M > N), >then the window is multiplied by a sin(x)/x function to meet the >condition: analWindow[Ni] = 0 for i != 0." i see... things are more clear now, although i can't say i fully understand the rationale behind the technique. definitely a twist compared with the "plain" phase vocoder techniques i was more or less familiar with. thanks for the clarifications, richard! lj (perhaps a summarized version of this information could make its way into the manual?) Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" |
Date | 2011-11-18 17:08 |
From | Tito Latini |
Subject | Re: [Csnd] determining FFT size |
Attachments | None |
Date | 2011-11-18 18:14 |
From | "Dr. Richard Boulanger" |
Subject | Re: [Csnd] determining FFT size |
More of Richard Dobson's insights on FFT and PVOC and Convolution in the manual would be a real plus... It would be great if the Manual itself included a bit more "teaching". Always grateful for Richard Dobson's posts! Dr.B. Sent from my iPad. On Nov 18, 2011, at 6:39 AM, luis jure |
Date | 2011-11-18 21:12 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
+ 1 On 18 November 2011 18:14, Dr. Richard Boulanger |
Date | 2011-11-18 22:03 |
From | Rory Walsh |
Subject | Re: [Csnd] determining FFT size |
On Friday, 18 November 2011, peiman khosravi <peimankhosravi@gmail.com> wrote: > + 1 > > On 18 November 2011 18:14, Dr. Richard Boulanger <rboulanger@berklee.edu> wrote: >> More of Richard Dobson's insights on FFT and PVOC and Convolution in the manual would be a real plus... It would be great if the Manual itself included a bit more "teaching". >> >> Always grateful for Richard Dobson's posts! >> >> Dr.B. >> >> Sent from my iPad. >> >> On Nov 18, 2011, at 6:39 AM, luis jure <ljc@internet.com.uy> wrote: >> >>> >>> on 2011-11-17 at 08:50 Richard Dobson wrote: >>> >>>> "The window is assumed to be symmetric with M total points. After the >>>> initial memory allocation, analWindow always points to the midpoint of >>>> the window (or one half sample to the right, if M is even); analWinLen >>>> is half the true window length (rounded down). Any low pass window will >>>> work; a Hamming window is generally fine, but a Kaiser is also >>>> available. If the window duration is longer than the transform (M > N), >>>> then the window is multiplied by a sin(x)/x function to meet the >>>> condition: analWindow[Ni] = 0 for i != 0." >>> >>> >>> i see... things are more clear now, although i can't say i fully >>> understand the rationale behind the technique. definitely a twist compared >>> with the "plain" phase vocoder techniques i was more or less familiar with. >>> >>> thanks for the clarifications, richard! >>> >>> lj >>> >>> >>> (perhaps a summarized version of this information could make its way into >>> the manual?) >>> >>> >>> >>> Send bugs reports to the Sourceforge bug tracker >>> https://sourceforge.net/tracker/?group_id=81968&atid=564599 >>> Discussions of bugs and features can be posted here >>> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" >>> >> >> >> Send bugs reports to the Sourceforge bug tracker >> https://sourceforge.net/tracker/?group_id=81968&atid=564599 >> Discussions of bugs and features can be posted here >> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" >> >> > > > Send bugs reports to the Sourceforge bug tracker > https://sourceforge.net/tracker/?group_id=81968&atid=564599 > Discussions of bugs and features can be posted here > To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" > > |
Date | 2013-02-12 10:56 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. I am not sure what is meant by interpolation in this context. And the number of point. I'm assuming that doesn't refer to the number of bins. Cheers, Peiman
On 10 November 2011 08:46, Andres Cabrera <mantaraya36@gmail.com> wrote: Hi, Peiman, |
Date | 2013-02-12 11:18 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
So I think I get it. Is this correct? So if you set the FFTsize to 2048 and window size to 4096, this means that your final analysis will have a frequency resolution of sr/4096 with FFTsize number of bins. So the additional bins which cannot be accommodated due to the FFTsize are just discarded. This is not a problem because we only use half (+1) of the bins anyway to avoid aliasing. Is this correct, in layman terms? P On 12 February 2013 10:56, peiman khosravi <peimankhosravi@gmail.com> wrote: Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. |
Date | 2013-02-12 18:52 |
From | Andres Cabrera |
Subject | Re: [Csnd] determining FFT size |
Hi, Interpolation means increasing the number of points in the spectrum by approximating them from the neighbors. Increasing the number of points in the fft ( with zero padding ) is equivalent to up sampling the signal to have more points. Cheers, On Feb 12, 2013 2:56 AM, "peiman khosravi" <peimankhosravi@gmail.com> wrote:
Sorry to revive and old thread. I thought I had understood this but I haven't quite! (A friend just asked me and I didn't know the answer.) I know very well how it affects the 'sound' but don't quite get the math. |
Date | 2013-02-13 08:59 |
From | peiman khosravi |
Subject | Re: [Csnd] determining FFT size |
Thanks Andres, Much appreciated. Best, Peiman
On 12 February 2013 18:52, Andres Cabrera <mantaraya36@gmail.com> wrote:
|