| I know that the team at Pompeu Fabra in Barcelona worked hard on this,
under the sponsorship of Yamaha, and they developed a vocal synthesis
method, which they have not fully explained anywhere, but it is possibly
based on analysis-synthesis methods. The Yamaha product (Vocoloid?)
is reputed to be very good, but I have neither seen nor heard it, and I really
don't know what it is capable of.
Victor
At 11:46 29/11/2005, you wrote:
>Hello,
>
>Olivier Bélanger, a doctoral student here, and myself have tackled this
>problem over the past few years. Our goal was precisely what you are
>aiming for: intelligeable sung text. I can certainly tell you that it is a
>very tough nut to crack. Identifiable consonant signals are carried
>largely by the time-based behaviour of transients. Timed onsets of
>formant shifts and noise components need to be calculated to very very
>precise thresholds in the transitory period between consonant generation
>and and following vowel. Unfortunately, the set of time shift data for
>the same consonant varies considerably depending on the vowel that follows
>it, leading to an exponentially expanding database. We put together a
>modestly successful model consisting of a database of signal analysis data
>for voiced and non-voiced consonants "B", "L", "M", "D", "K" and "S" (I
>may be wrong on the specific consonants, I'll have to check again with
>Olivier), a source filter synthesis consisting of a glottal simulator
>(gbuzz pulse train), noise generator and a bank of resonant filters. This
>was run from a Max control patch into the csound~ object. We have not made
>it past this (yet!) when we realised that the model falls apart
>spectacularly when applied to male or female voices and/or combined
>consonants such as "PR", "CL" or " SK". Each consonant type seems to need
>a generally adaptive dataset.
>
>This would explain why all successful language synthesizers are developped
>using a concatenation technique that implies no real signal synthesis but
>rather, a very large bank of short (sampled) audio signals that are
>stringed together to form words. For the same voice, this work well for
>language comprehension but it is dreadful for general musical purposes.
>The company East-West (i think that is what it is called) has recently
>released an articualted choir sample library that uses concatenation to
>articulate text. What I have heard sounds quite remarkable, but i suspect
>it is realistic because it is a choir and concatenation artifacts are
>blurred out by the "mass" effect.
>
>If anyone has worked on this problem, I would be delighted to hear from you!
>
>Best
>
>jp
>
>__________________________________________
>http://jeanpiche.com
>
>
>On 05-11-29, at 02:43, Simon Stump wrote:
>
>>Hey,
>>
>> So, I'm trying to write a song right now where I
>>do vocal synthesis with the fof command. I've got
>>plenty of tables and websites for how to generate
>>vowels, but none for consonants. Does anyone know how
>>to generate an "m" sound (or any others for that
>>matter)?
>>
>>Simon
>>
>>
>>
>>__________________________________
>>Yahoo! Music Unlimited
>>Access over 1 million songs. Try it free.
>>http://music.yahoo.com/unlimited/
>>--
>>Send bugs reports to this list.
>>To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk
>
>--
>Send bugs reports to this list.
>To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk
Victor Lazzarini
Music Technology Laboratory
Music Department
National University of Ireland, Maynooth |