Speech Processing/Analysis
| Date | 1999-03-11 08:40 |
| From | Hans Mikelson |
| Subject | Speech Processing/Analysis |
Hi,
After battling with chronic laryngitis for the past year I have gotten to
know some speech therapists. I have discussed with them some of the ways in
which they analyze voices. Some of the parameters they like to track are:
Pitch, jitter, shimmer, and for normal speaking: pitch average and
variation. I have been working on implementing some of the analysis
routines in Csound. If someone cares to give me a reference for some
algorithms for this type of processing I would appreciate it.
I notice that Csound's LPC analysis has a pitch tracker which may be
helpful.
As I was experimenting I created a normal speech to whisper converter:
sr=44100
kr=22050
ksmps=2
nchnls=2
; LPC Resynthesis Whisper
instr 12
idur = p3
ktimpnt linseg 0, idur, idur
krmsr, krmso, kerr, kcps lpread ktimpnt, "trantr.lpc"
anz1 rand krmso
anz butterhp anz1, 1000
asig1 = anz*(kerr-.0002)*2
aout lpreson asig1
outs aout*10, aout*10
endin
; Sta Dur
i12 0 8.69
The LPC was run with default options to create the file trantr.lpc from a
sample of spoken text.
Conversion of normal speech to resynthesized normal speech seems to be a bit
more difficult. The simulations of glottal pulses I've tried result in an
artificial and grainy tone to the voice.
I tried using buzz without much luck. I've also tried using oscil with a
simulated glottal pulse as follows:
f5 0 8192 8 0 1024 1 256 .7 256 .9 64 -.8 256 -.65 256 -.75 256 -.5 192 0
1024 0 4608 0
which did not sound too great either. I am beginning to wonder if it is
perhaps the filter interpolation which introduces the gritty sound quality.
Does anyone know of an example of using LPC to resynthesize a normal voice
or does anyone know of a good waveform for simulating a glottal pulse?
There was some research at UCSF on training children with language learning
disabilities using modified speech which resulted in a Nature article a
couple of years ago. The speech was time stretched and certain consonants
like "b" and "d" were amplified. This later evolved into a program called
"Fast Forward." I wrote an orc/sco a while ago that attempts to implement
this type of modified speech using pvoc. If anyone is interested in this it
is in my zip file (mikelson.zip at bath) entitled "dog.orc". I am not sure
how close this orc comes to reproducing the modified speech used at UCSF and
FastForward. (Note Fast Forward is making some pretty sensational claims
which makes me a bit suspicious of this program.)
Bye,
Hans Mikelson
|
| Date | 1999-03-12 03:07 |
| From | Erik Spjut |
| Subject | Re: Speech Processing/Analysis |
I've actually had quite a bit of success with lpreson. Let me tell you some of the tradeoffs I've found. First the order during analysis is critical. Too low and you lose intelegibility. Too high and you end up buzzy. The problem is that lpanal begins to fit (non-existant) formants to the noise at higher orders. The best way is to examine each of the filter responses (you have to use a seperate program) and remove the false formants, but it's very tedious. Another method is to use two different analysis files with different orders (from the same sound) and detune the two generated sounds slightly. You can usually cancel out most of the extraneous formants that way. I often use gbuzz and decrease the high partials a little bit to remove the buzziness too. One last item, and no-one seems to believe me. The manual says (at least the last time I checked) that the transition from voiced to unvoiced speech occurs at a kerr of 0.3. The actual value is 10^-3 = 0.001. You can get some really nice v's and z's if you logarithmically cross-fade from buzz at kerr=0.001 to rand at kerr=0.01. I hope these help. At 2:40 AM -0600 3/11/99, Hans Mikelson wrote: >Hi, > >After battling with chronic laryngitis for the past year I have gotten to >know some speech therapists. I have discussed with them some of the ways in >which they analyze voices. Some of the parameters they like to track are: >Pitch, jitter, shimmer, and for normal speaking: pitch average and >variation. I have been working on implementing some of the analysis >routines in Csound. If someone cares to give me a reference for some >algorithms for this type of processing I would appreciate it. > >I notice that Csound's LPC analysis has a pitch tracker which may be >helpful. > Dr. R. Erik Spjut |
| Date | 1999-03-12 03:23 |
| From | Charles Starrett |
| Subject | ExZAKtly how do you use this... |
Are there any docs or orc/scos which show how ZAK space can be used for mixing? I've put the Csound manual under my pillow for over a month now and I still can't get it figured out... Just wonderin'... -- /----Charles D. Starrett-----\ "I do not feel that | / | ____ | | ____ | | my research suffered unduly | /\ | |-- |-| ___| | | from the fact that I enjoyed it." | |___ |____| | |_____| | *Daniel Miller, \--starrett@fas.harvard.edu--/ Modernity--an Ethnographic Approach |