Csound Csound-dev Csound-tekno Search About

[Csnd] PVANAL & PVOC/PVS vs. LPC & PSOLA

Date2008-06-12 20:52
From"David Akbari"
Subject[Csnd] PVANAL & PVOC/PVS vs. LPC & PSOLA
AttachmentsNone  

Date2008-06-21 15:46
FromRichard Bowers
Subject[Csnd] Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
There has been no reply on the list to this. Did anyone reply to David 
privately? I would be interested in the responses if there were any.

--Richard.

David Akbari wrote:
> Hi List and Dr. Dobson,
>
> In my recent work I have come across the paradigm of creating a
> continuum from endpoint stimuli in experimental procedures using
> synthetic sounds as the end points.
>
> I'm specifically wondering, what are the major differences in the
> abstract between the linear predictive coding analysis and
> pitch-synchronous-overlap-add resynthesis and the spectral streaming
> phase vocoder analysis/resynthesis as it is implemented today in
> Csound ?
>
> Many people are using the LPC/PSOLA but I know from musical experience
> that the PVS/PVX format sounds much better. I'm trying to get a better
> idea of why this is so... any scholarly papers, websites, or similar
> online resources would be greatly appreciated!
>
>
> Thank you for your time and consideration,
>
> David Akbari
>
>
> Send bugs reports to this list.
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG. 
> Version: 8.0.100 / Virus Database: 270.3.0/1500 - Release Date: 6/12/2008 4:58 PM
>   


Date2008-06-21 20:31
From"David Akbari"
Subject[Csnd] Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
AttachmentsNone  

Date2008-06-21 23:21
FromRichard Bowers
Subject[Csnd] Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
That's interesting. I noticed from a project I was involved in at the 
Cardiff School of Psychology that the software of choice seems to be 
Praat for speech analysis. But I heard little mention of phase vocoder 
analysis/resynthesis. I'll ask someone there for some information on 
software choices and methods.

-Richard

David Akbari wrote:
> Not yet.
>
> The reason I'm asking is because I know many people involved in
> sociolinguistics who are using the LPC/PSOLA for analysis/resynthesis
> of speech, specifically.
>
> I know from musical experience that the streaming f-sig analysis
> format implemented in CDP and Csound is far superior. I just need some
> resources to cite to prove this to these individuals. Simply producing
> sound for A/B comparison has been OK.. but it would be nice to have a
> more pedantic substantive basis for these claims of superiority. Then
> we might see a wider adoption of this technology beyond the scope of
> computer music circles.
>
>
> -David
>
> On Sat, Jun 21, 2008 at 9:46 AM, Richard Bowers
>  wrote:
>   
>> There has been no reply on the list to this. Did anyone reply to David
>> privately? I would be interested in the responses if there were any.
>>
>> --Richard.
>>
>> David Akbari wrote:
>>     
>>> Hi List and Dr. Dobson,
>>>
>>> In my recent work I have come across the paradigm of creating a
>>> continuum from endpoint stimuli in experimental procedures using
>>> synthetic sounds as the end points.
>>>
>>> I'm specifically wondering, what are the major differences in the
>>> abstract between the linear predictive coding analysis and
>>> pitch-synchronous-overlap-add resynthesis and the spectral streaming
>>> phase vocoder analysis/resynthesis as it is implemented today in
>>> Csound ?
>>>
>>> Many people are using the LPC/PSOLA but I know from musical experience
>>> that the PVS/PVX format sounds much better. I'm trying to get a better
>>> idea of why this is so... any scholarly papers, websites, or similar
>>> online resources would be greatly appreciated!
>>>
>>>
>>> Thank you for your time and consideration,
>>>
>>> David Akbari
>>>
>>>       
>
>
> Send bugs reports to this list.
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG. 
> Version: 8.0.100 / Virus Database: 270.4.1/1511 - Release Date: 6/20/2008 11:52 AM
>   


Date2008-06-23 17:37
FromJohn Lato
Subject[Csnd] Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
Okay, I'll give this one a shot.  Keep in mind that this answer is not rigorous in 
any sense, in general based on my understanding of the algorithms and not their 
implementations in csound, and possibly apocryphal or flat-out wrong.

LPC works on the assumption that the source sound is basically a filtered buzz.  In 
the analysis process, formants are estimated and filtered out of the sound.  What 
remains is called the residue.  From the residue, the intensity and frequency of the 
buzz can be calculated.  As with STFT and the streaming phase vocoder 
implementations, this process is done on short frames of audio.  Wikipedia indicates 
30-50 frames/sec are usually successful for speech.  In order to resynthesize a 
signal analyzed with LPC, you then just filter a source signal (typically a mix of 
buzz and noise), which should yield approximately the same output.

The streaming phase vocoder is based on the short-time Fourier transform, a 
completely different method of analysis.  Each frame of audio is transformed into a 
series of frequency bins.  The number of bins is dependent on the length of the 
analysis frame.  The analysis produces an amplitude-phase pair for each bin.  These 
amplitude-phase pairs can then be resynthesized using an inverse Fourier transform.

Given this information, there's a clear reason why STFT methods are often more 
successful in musical contexts.  LPC assumes that sound is produced by a filtered 
buzz.  While this is relatively true for speech/voice, it is less accurate for many 
musical instruments and other audio sources, and completely falls apart in polyphonic 
contexts.  Furthermore the output of LPC, at least in Csound's implementation, varies 
widely depending on the analysis parameters.  I haven't witnessed as large of a 
variance in STFT methods.  This makes it much easier to get bad results with LPC. 
Presumably if you use the method a lot, it's much easier to determine good parameters 
at the outset.

I cannot comment on pitch-synchronous overlap-add methods.

There's also a clear reason why sociolinguists would use LPC.  It has a long history 
of being used for speech applications and in publications, therefore it's 
well-understood within the field.  The same cannot be said for the phase vocoder. 
Besides that, as LPC analysis is built on the assumption that the sound source is 
vocal-like, the analysis data is directly applicable to vocal models.  With an 
STFT-based analysis, there would need to be an intermediate step of analyzing the 
analysis output to match it to a vocal model.

I doubt any studies exist that you could cite to prove that STFT analysis is superior 
to LPC for the purposes of linguists; such studies would almost certainly have been 
performed by linguists, and they're probably too busy doing their real work to 
compare LPC to some other method they don't know about.  I'm not convinced it's true 
myself (I prefer LPC to pvsanal et al. when the source is suitable for LPC).

If you want to convince sociolinguists to use pvsanal-like tools, you may need to get 
them interested enough in the tool to do such research themselves.  I would begin 
such a conversation by asking about how LPC data is used, what the known limitations 
of the method are, and if there's anything they wish the analysis could provide that 
it doesn't.

John W. Lato
Sarah and Ernest Butler School of Music
The University of Texas at Austin
1 University Station E3100
Austin, TX 78712-0435
(512) 232-2090

David Akbari wrote:
> Not yet.
> 
> The reason I'm asking is because I know many people involved in
> sociolinguistics who are using the LPC/PSOLA for analysis/resynthesis
> of speech, specifically.
> 
> I know from musical experience that the streaming f-sig analysis
> format implemented in CDP and Csound is far superior. I just need some
> resources to cite to prove this to these individuals. Simply producing
> sound for A/B comparison has been OK.. but it would be nice to have a
> more pedantic substantive basis for these claims of superiority. Then
> we might see a wider adoption of this technology beyond the scope of
> computer music circles.
> 
> 
> -David
> 
> On Sat, Jun 21, 2008 at 9:46 AM, Richard Bowers
>  wrote:
>> There has been no reply on the list to this. Did anyone reply to David
>> privately? I would be interested in the responses if there were any.
>>
>> --Richard.
>>
>> David Akbari wrote:
>>> Hi List and Dr. Dobson,
>>>
>>> In my recent work I have come across the paradigm of creating a
>>> continuum from endpoint stimuli in experimental procedures using
>>> synthetic sounds as the end points.
>>>
>>> I'm specifically wondering, what are the major differences in the
>>> abstract between the linear predictive coding analysis and
>>> pitch-synchronous-overlap-add resynthesis and the spectral streaming
>>> phase vocoder analysis/resynthesis as it is implemented today in
>>> Csound ?
>>>
>>> Many people are using the LPC/PSOLA but I know from musical experience
>>> that the PVS/PVX format sounds much better. I'm trying to get a better
>>> idea of why this is so... any scholarly papers, websites, or similar
>>> online resources would be greatly appreciated!
>>>
>>>
>>> Thank you for your time and consideration,
>>>
>>> David Akbari
>>>
> 
> 
> Send bugs reports to this list.
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"