Csound Csound-dev Csound-tekno Search About

[Csnd] Re: Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA

Date2008-06-23 18:41
FromMichael Gogins
Subject[Csnd] Re: Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
LPC is a kind of very simple model of the vocal tract. The buzz is the vocal cords buzzing, the filter is the cavities in the throat and head. The same cannot be said of the phase vocoder. 

In general, a model is better suited for research than some other way of reproducing the same data, because you can control the model in ways that make sense in the research context.

As for praat, it is a very complete physical model of the entire human vocal tract that is fed by specialized analysis tools. Way beyond LPC...

It would probably make sense to have praat opcodes in Csound! Perhaps it could be gotten to sing in various languages.

Praat has been used by musicologists to analyze and study vocal performance and melody.

Regards,
Mike

-----Original Message-----
>From: John Lato 
>Sent: Jun 23, 2008 12:37 PM
>To: csound@lists.bath.ac.uk
>Subject: [Csnd] Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
>
>Okay, I'll give this one a shot.  Keep in mind that this answer is not rigorous in 
>any sense, in general based on my understanding of the algorithms and not their 
>implementations in csound, and possibly apocryphal or flat-out wrong.
>
>LPC works on the assumption that the source sound is basically a filtered buzz.  In 
>the analysis process, formants are estimated and filtered out of the sound.  What 
>remains is called the residue.  From the residue, the intensity and frequency of the 
>buzz can be calculated.  As with STFT and the streaming phase vocoder 
>implementations, this process is done on short frames of audio.  Wikipedia indicates 
>30-50 frames/sec are usually successful for speech.  In order to resynthesize a 
>signal analyzed with LPC, you then just filter a source signal (typically a mix of 
>buzz and noise), which should yield approximately the same output.
>
>The streaming phase vocoder is based on the short-time Fourier transform, a 
>completely different method of analysis.  Each frame of audio is transformed into a 
>series of frequency bins.  The number of bins is dependent on the length of the 
>analysis frame.  The analysis produces an amplitude-phase pair for each bin.  These 
>amplitude-phase pairs can then be resynthesized using an inverse Fourier transform.
>
>Given this information, there's a clear reason why STFT methods are often more 
>successful in musical contexts.  LPC assumes that sound is produced by a filtered 
>buzz.  While this is relatively true for speech/voice, it is less accurate for many 
>musical instruments and other audio sources, and completely falls apart in polyphonic 
>contexts.  Furthermore the output of LPC, at least in Csound's implementation, varies 
>widely depending on the analysis parameters.  I haven't witnessed as large of a 
>variance in STFT methods.  This makes it much easier to get bad results with LPC. 
>Presumably if you use the method a lot, it's much easier to determine good parameters 
>at the outset.
>
>I cannot comment on pitch-synchronous overlap-add methods.
>
>There's also a clear reason why sociolinguists would use LPC.  It has a long history 
>of being used for speech applications and in publications, therefore it's 
>well-understood within the field.  The same cannot be said for the phase vocoder. 
>Besides that, as LPC analysis is built on the assumption that the sound source is 
>vocal-like, the analysis data is directly applicable to vocal models.  With an 
>STFT-based analysis, there would need to be an intermediate step of analyzing the 
>analysis output to match it to a vocal model.
>
>I doubt any studies exist that you could cite to prove that STFT analysis is superior 
>to LPC for the purposes of linguists; such studies would almost certainly have been 
>performed by linguists, and they're probably too busy doing their real work to 
>compare LPC to some other method they don't know about.  I'm not convinced it's true 
>myself (I prefer LPC to pvsanal et al. when the source is suitable for LPC).
>
>If you want to convince sociolinguists to use pvsanal-like tools, you may need to get 
>them interested enough in the tool to do such research themselves.  I would begin 
>such a conversation by asking about how LPC data is used, what the known limitations 
>of the method are, and if there's anything they wish the analysis could provide that 
>it doesn't.
>
>John W. Lato
>Sarah and Ernest Butler School of Music
>The University of Texas at Austin
>1 University Station E3100
>Austin, TX 78712-0435
>(512) 232-2090
>
>David Akbari wrote:
>> Not yet.
>> 
>> The reason I'm asking is because I know many people involved in
>> sociolinguistics who are using the LPC/PSOLA for analysis/resynthesis
>> of speech, specifically.
>> 
>> I know from musical experience that the streaming f-sig analysis
>> format implemented in CDP and Csound is far superior. I just need some
>> resources to cite to prove this to these individuals. Simply producing
>> sound for A/B comparison has been OK.. but it would be nice to have a
>> more pedantic substantive basis for these claims of superiority. Then
>> we might see a wider adoption of this technology beyond the scope of
>> computer music circles.
>> 
>> 
>> -David
>> 
>> On Sat, Jun 21, 2008 at 9:46 AM, Richard Bowers
>>  wrote:
>>> There has been no reply on the list to this. Did anyone reply to David
>>> privately? I would be interested in the responses if there were any.
>>>
>>> --Richard.
>>>
>>> David Akbari wrote:
>>>> Hi List and Dr. Dobson,
>>>>
>>>> In my recent work I have come across the paradigm of creating a
>>>> continuum from endpoint stimuli in experimental procedures using
>>>> synthetic sounds as the end points.
>>>>
>>>> I'm specifically wondering, what are the major differences in the
>>>> abstract between the linear predictive coding analysis and
>>>> pitch-synchronous-overlap-add resynthesis and the spectral streaming
>>>> phase vocoder analysis/resynthesis as it is implemented today in
>>>> Csound ?
>>>>
>>>> Many people are using the LPC/PSOLA but I know from musical experience
>>>> that the PVS/PVX format sounds much better. I'm trying to get a better
>>>> idea of why this is so... any scholarly papers, websites, or similar
>>>> online resources would be greatly appreciated!
>>>>
>>>>
>>>> Thank you for your time and consideration,
>>>>
>>>> David Akbari
>>>>
>> 
>> 
>> Send bugs reports to this list.
>> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
>
>
>Send bugs reports to this list.
>To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"




Date2008-06-23 19:00
FromRichard Bowers
Subject[Csnd] Contacting Jim co-editor of CSound Journal
Could somoene please give me the email address of Jim, who edits CSound 
Journal?

Many thanks,

Richard.

Date2008-06-23 20:34
From"Oeyvind Brandtsegg"
Subject[Csnd] Re: Contacting Jim co-editor of CSound Journal
AttachmentsNone  None  

Date2008-06-23 21:03
FromDavidW
Subject[Csnd] Re: Re: Re: Re: Re: PVANAL & PVOC/PVS vs. LPC & PSOLA
An excellent piece of SW it is too!
These days it seems one can talk to it directly, at least via python:

http://razor.occams.info/code/praat-py/

D.
On 24/06/2008, at 3:41 AM, Michael Gogins wrote:
...
>
> As for praat, it is a very complete physical model of the entire  
> human vocal tract that is fed by specialized analysis tools. Way  
> beyond LPC...
>
> It would probably make sense to have praat opcodes in Csound!  
> Perhaps it could be gotten to sing in various languages.
>
> Praat has been used by musicologists to analyze and study vocal  
> performance and melody.
>
> Regards,
> Mike
>

Date2008-06-23 21:18
FromRichard Bowers
Subject[Csnd] Re: Re: Contacting Jim co-editor of CSound Journal
Many thanks, Oeyvind.

Oeyvind Brandtsegg wrote:
> I think that it's
> "James Hearon" >,
>
>  
> 2008/6/23, Richard Bowers  >:
>
>     Could somoene please give me the email address of Jim, who edits
>     CSound Journal?
>
>     Many thanks,
>
>     Richard.
>
>
>     Send bugs reports to this list.
>     To unsubscribe, send email sympa@lists.bath.ac.uk
>      with body "unsubscribe csound"
>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG. 
> Version: 8.0.100 / Virus Database: 270.4.1/1514 - Release Date: 6/23/2008 7:17 AM
>