[Csnd] PVANAL & PVOC/PVS vs. LPC & PSOLA

[Csnd] PVANAL & PVOC/PVS vs. LPC & PSOLA

Date	2008-06-12 20:52
From	"David Akbari"
Subject	[Csnd] PVANAL & PVOC/PVS vs. LPC & PSOLA
Attachments	None

Date	2008-06-21 15:46
From	Richard Bowers
Subject	[Csnd] Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
	There has been no reply on the list to this. Did anyone reply to David privately? I would be interested in the responses if there were any. --Richard. David Akbari wrote: > Hi List and Dr. Dobson, > > In my recent work I have come across the paradigm of creating a > continuum from endpoint stimuli in experimental procedures using > synthetic sounds as the end points. > > I'm specifically wondering, what are the major differences in the > abstract between the linear predictive coding analysis and > pitch-synchronous-overlap-add resynthesis and the spectral streaming > phase vocoder analysis/resynthesis as it is implemented today in > Csound ? > > Many people are using the LPC/PSOLA but I know from musical experience > that the PVS/PVX format sounds much better. I'm trying to get a better > idea of why this is so... any scholarly papers, websites, or similar > online resources would be greatly appreciated! > > > Thank you for your time and consideration, > > David Akbari > > > Send bugs reports to this list. > To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" > ------------------------------------------------------------------------ > > > No virus found in this incoming message. > Checked by AVG. > Version: 8.0.100 / Virus Database: 270.3.0/1500 - Release Date: 6/12/2008 4:58 PM >

Date	2008-06-21 20:31
From	"David Akbari"
Subject	[Csnd] Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
Attachments	None

Date	2008-06-21 23:21
From	Richard Bowers
Subject	[Csnd] Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
	That's interesting. I noticed from a project I was involved in at the Cardiff School of Psychology that the software of choice seems to be Praat for speech analysis. But I heard little mention of phase vocoder analysis/resynthesis. I'll ask someone there for some information on software choices and methods. -Richard David Akbari wrote: > Not yet. > > The reason I'm asking is because I know many people involved in > sociolinguistics who are using the LPC/PSOLA for analysis/resynthesis > of speech, specifically. > > I know from musical experience that the streaming f-sig analysis > format implemented in CDP and Csound is far superior. I just need some > resources to cite to prove this to these individuals. Simply producing > sound for A/B comparison has been OK.. but it would be nice to have a > more pedantic substantive basis for these claims of superiority. Then > we might see a wider adoption of this technology beyond the scope of > computer music circles. > > > -David > > On Sat, Jun 21, 2008 at 9:46 AM, Richard Bowers > wrote: > >> There has been no reply on the list to this. Did anyone reply to David >> privately? I would be interested in the responses if there were any. >> >> --Richard. >> >> David Akbari wrote: >> >>> Hi List and Dr. Dobson, >>> >>> In my recent work I have come across the paradigm of creating a >>> continuum from endpoint stimuli in experimental procedures using >>> synthetic sounds as the end points. >>> >>> I'm specifically wondering, what are the major differences in the >>> abstract between the linear predictive coding analysis and >>> pitch-synchronous-overlap-add resynthesis and the spectral streaming >>> phase vocoder analysis/resynthesis as it is implemented today in >>> Csound ? >>> >>> Many people are using the LPC/PSOLA but I know from musical experience >>> that the PVS/PVX format sounds much better. I'm trying to get a better >>> idea of why this is so... any scholarly papers, websites, or similar >>> online resources would be greatly appreciated! >>> >>> >>> Thank you for your time and consideration, >>> >>> David Akbari >>> >>> > > > Send bugs reports to this list. > To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound" > ------------------------------------------------------------------------ > > > No virus found in this incoming message. > Checked by AVG. > Version: 8.0.100 / Virus Database: 270.4.1/1511 - Release Date: 6/20/2008 11:52 AM >

Date	2008-06-23 17:37
From	John Lato
Subject	[Csnd] Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
	Okay, I'll give this one a shot. Keep in mind that this answer is not rigorous in any sense, in general based on my understanding of the algorithms and not their implementations in csound, and possibly apocryphal or flat-out wrong. LPC works on the assumption that the source sound is basically a filtered buzz. In the analysis process, formants are estimated and filtered out of the sound. What remains is called the residue. From the residue, the intensity and frequency of the buzz can be calculated. As with STFT and the streaming phase vocoder implementations, this process is done on short frames of audio. Wikipedia indicates 30-50 frames/sec are usually successful for speech. In order to resynthesize a signal analyzed with LPC, you then just filter a source signal (typically a mix of buzz and noise), which should yield approximately the same output. The streaming phase vocoder is based on the short-time Fourier transform, a completely different method of analysis. Each frame of audio is transformed into a series of frequency bins. The number of bins is dependent on the length of the analysis frame. The analysis produces an amplitude-phase pair for each bin. These amplitude-phase pairs can then be resynthesized using an inverse Fourier transform. Given this information, there's a clear reason why STFT methods are often more successful in musical contexts. LPC assumes that sound is produced by a filtered buzz. While this is relatively true for speech/voice, it is less accurate for many musical instruments and other audio sources, and completely falls apart in polyphonic contexts. Furthermore the output of LPC, at least in Csound's implementation, varies widely depending on the analysis parameters. I haven't witnessed as large of a variance in STFT methods. This makes it much easier to get bad results with LPC. Presumably if you use the method a lot, it's much easier to determine good parameters at the outset. I cannot comment on pitch-synchronous overlap-add methods. There's also a clear reason why sociolinguists would use LPC. It has a long history of being used for speech applications and in publications, therefore it's well-understood within the field. The same cannot be said for the phase vocoder. Besides that, as LPC analysis is built on the assumption that the sound source is vocal-like, the analysis data is directly applicable to vocal models. With an STFT-based analysis, there would need to be an intermediate step of analyzing the analysis output to match it to a vocal model. I doubt any studies exist that you could cite to prove that STFT analysis is superior to LPC for the purposes of linguists; such studies would almost certainly have been performed by linguists, and they're probably too busy doing their real work to compare LPC to some other method they don't know about. I'm not convinced it's true myself (I prefer LPC to pvsanal et al. when the source is suitable for LPC). If you want to convince sociolinguists to use pvsanal-like tools, you may need to get them interested enough in the tool to do such research themselves. I would begin such a conversation by asking about how LPC data is used, what the known limitations of the method are, and if there's anything they wish the analysis could provide that it doesn't. John W. Lato Sarah and Ernest Butler School of Music The University of Texas at Austin 1 University Station E3100 Austin, TX 78712-0435 (512) 232-2090 David Akbari wrote: > Not yet. > > The reason I'm asking is because I know many people involved in > sociolinguistics who are using the LPC/PSOLA for analysis/resynthesis > of speech, specifically. > > I know from musical experience that the streaming f-sig analysis > format implemented in CDP and Csound is far superior. I just need some > resources to cite to prove this to these individuals. Simply producing > sound for A/B comparison has been OK.. but it would be nice to have a > more pedantic substantive basis for these claims of superiority. Then > we might see a wider adoption of this technology beyond the scope of > computer music circles. > > > -David > > On Sat, Jun 21, 2008 at 9:46 AM, Richard Bowers > wrote: >> There has been no reply on the list to this. Did anyone reply to David >> privately? I would be interested in the responses if there were any. >> >> --Richard. >> >> David Akbari wrote: >>> Hi List and Dr. Dobson, >>> >>> In my recent work I have come across the paradigm of creating a >>> continuum from endpoint stimuli in experimental procedures using >>> synthetic sounds as the end points. >>> >>> I'm specifically wondering, what are the major differences in the >>> abstract between the linear predictive coding analysis and >>> pitch-synchronous-overlap-add resynthesis and the spectral streaming >>> phase vocoder analysis/resynthesis as it is implemented today in >>> Csound ? >>> >>> Many people are using the LPC/PSOLA but I know from musical experience >>> that the PVS/PVX format sounds much better. I'm trying to get a better >>> idea of why this is so... any scholarly papers, websites, or similar >>> online resources would be greatly appreciated! >>> >>> >>> Thank you for your time and consideration, >>> >>> David Akbari >>> > > > Send bugs reports to this list. > To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"