Csound Csound-dev Csound-tekno Search About

Speech Processing/Analysis

Date1999-03-11 08:40
FromHans Mikelson
SubjectSpeech Processing/Analysis
Hi,

After battling with chronic laryngitis for the past year I have gotten to
know some speech therapists.  I have discussed with them some of the ways in
which they analyze voices.  Some of the parameters they like to track are:
Pitch, jitter, shimmer, and for normal speaking: pitch average and
variation.  I have been working on implementing some of the analysis
routines in Csound.  If someone cares to give me a reference for some
algorithms for this type of processing I would appreciate it.

I notice that Csound's LPC analysis has a pitch tracker which may be
helpful.

As I was experimenting I created a normal speech to whisper converter:

sr=44100
kr=22050
ksmps=2
nchnls=2

; LPC Resynthesis Whisper
        instr   12

idur    =      p3
ktimpnt linseg 0, idur, idur

krmsr, krmso, kerr, kcps lpread ktimpnt, "trantr.lpc"

anz1    rand   krmso
anz     butterhp anz1, 1000

asig1  =        anz*(kerr-.0002)*2
aout   lpreson   asig1

      outs    aout*10, aout*10

      endin

;   Sta  Dur
i12 0    8.69

The LPC was run with default options to create the file trantr.lpc from a
sample of spoken text.

Conversion of normal speech to resynthesized normal speech seems to be a bit
more difficult.  The simulations of glottal pulses I've tried result in an
artificial and grainy tone to the voice.

I tried using buzz without much luck.  I've also tried using oscil with a
simulated glottal pulse as follows:

f5 0 8192 8  0 1024 1 256 .7 256 .9 64 -.8 256 -.65 256 -.75 256 -.5 192 0
1024 0 4608 0

which did not sound too great either.  I am beginning to wonder if it is
perhaps the filter interpolation which introduces the gritty sound quality.
Does anyone know of an example of using LPC to resynthesize a normal voice
or does anyone know of a good waveform for simulating a glottal pulse?

There was some research at UCSF on training children with language learning
disabilities using modified speech which resulted in a Nature article a
couple of years ago.  The speech was time stretched and certain consonants
like "b" and "d" were amplified.  This later evolved into a program called
"Fast Forward."  I wrote an orc/sco a while ago that attempts to implement
this type of modified speech using pvoc.  If anyone is interested in this it
is in my zip file (mikelson.zip at bath) entitled "dog.orc".  I am not sure
how close this orc comes to reproducing the modified speech used at UCSF and
FastForward.  (Note Fast Forward is making some pretty sensational claims
which makes me a bit suspicious of this program.)

Bye,
Hans Mikelson


Date1999-03-12 03:07
FromErik Spjut
SubjectRe: Speech Processing/Analysis
I've actually had quite a bit of success with lpreson. Let me tell you some
of the tradeoffs I've found. First the order during analysis is critical.
Too low and you lose intelegibility. Too high and you end up buzzy. The
problem is that lpanal begins to fit (non-existant) formants to the noise
at higher orders. The best way is to examine each of the filter responses
(you have to use a seperate program) and remove the false formants, but
it's very tedious. Another method is to use two different analysis files
with different orders (from the same sound) and detune the two generated
sounds slightly. You can usually cancel out most of the extraneous formants
that way.

I often use gbuzz and decrease the high partials a little bit to remove the
buzziness too.

One last item, and no-one seems to believe me. The manual says (at least
the last time I checked) that the transition from voiced to unvoiced speech
occurs at a kerr of 0.3. The actual value is 10^-3 = 0.001. You can get
some really nice v's and z's if you logarithmically cross-fade from buzz at
kerr=0.001 to rand at kerr=0.01. I hope these help.

At 2:40 AM -0600 3/11/99, Hans Mikelson wrote:
>Hi,
>
>After battling with chronic laryngitis for the past year I have gotten to
>know some speech therapists.  I have discussed with them some of the ways in
>which they analyze voices.  Some of the parameters they like to track are:
>Pitch, jitter, shimmer, and for normal speaking: pitch average and
>variation.  I have been working on implementing some of the analysis
>routines in Csound.  If someone cares to give me a reference for some
>algorithms for this type of processing I would appreciate it.
>
>I notice that Csound's LPC analysis has a pitch tracker which may be
>helpful.
>


Dr. R. Erik Spjut

Date1999-03-12 03:23
FromCharles Starrett
SubjectExZAKtly how do you use this...
Are there any docs or orc/scos which show how ZAK space can be used for
mixing?  I've put the Csound manual under my pillow for over a month now
and I still can't get it figured out...

Just wonderin'...

--
/----Charles D. Starrett-----\   "I do not feel that
|    / | ____ | |  ____  |   |    my research suffered unduly
|   /\ | |--  |-|   ___| |   |    from the fact that I enjoyed it."
|   |___ |____| |  |_____|   |   *Daniel Miller,
\--starrett@fas.harvard.edu--/    Modernity--an Ethnographic Approach