Speech Processing/Analysis
Date | 1999-03-11 08:40 |
From | Hans Mikelson |
Subject | Speech Processing/Analysis |
Hi, After battling with chronic laryngitis for the past year I have gotten to know some speech therapists. I have discussed with them some of the ways in which they analyze voices. Some of the parameters they like to track are: Pitch, jitter, shimmer, and for normal speaking: pitch average and variation. I have been working on implementing some of the analysis routines in Csound. If someone cares to give me a reference for some algorithms for this type of processing I would appreciate it. I notice that Csound's LPC analysis has a pitch tracker which may be helpful. As I was experimenting I created a normal speech to whisper converter: sr=44100 kr=22050 ksmps=2 nchnls=2 ; LPC Resynthesis Whisper instr 12 idur = p3 ktimpnt linseg 0, idur, idur krmsr, krmso, kerr, kcps lpread ktimpnt, "trantr.lpc" anz1 rand krmso anz butterhp anz1, 1000 asig1 = anz*(kerr-.0002)*2 aout lpreson asig1 outs aout*10, aout*10 endin ; Sta Dur i12 0 8.69 The LPC was run with default options to create the file trantr.lpc from a sample of spoken text. Conversion of normal speech to resynthesized normal speech seems to be a bit more difficult. The simulations of glottal pulses I've tried result in an artificial and grainy tone to the voice. I tried using buzz without much luck. I've also tried using oscil with a simulated glottal pulse as follows: f5 0 8192 8 0 1024 1 256 .7 256 .9 64 -.8 256 -.65 256 -.75 256 -.5 192 0 1024 0 4608 0 which did not sound too great either. I am beginning to wonder if it is perhaps the filter interpolation which introduces the gritty sound quality. Does anyone know of an example of using LPC to resynthesize a normal voice or does anyone know of a good waveform for simulating a glottal pulse? There was some research at UCSF on training children with language learning disabilities using modified speech which resulted in a Nature article a couple of years ago. The speech was time stretched and certain consonants like "b" and "d" were amplified. This later evolved into a program called "Fast Forward." I wrote an orc/sco a while ago that attempts to implement this type of modified speech using pvoc. If anyone is interested in this it is in my zip file (mikelson.zip at bath) entitled "dog.orc". I am not sure how close this orc comes to reproducing the modified speech used at UCSF and FastForward. (Note Fast Forward is making some pretty sensational claims which makes me a bit suspicious of this program.) Bye, Hans Mikelson |
Date | 1999-03-12 03:07 |
From | Erik Spjut |
Subject | Re: Speech Processing/Analysis |
I've actually had quite a bit of success with lpreson. Let me tell you some of the tradeoffs I've found. First the order during analysis is critical. Too low and you lose intelegibility. Too high and you end up buzzy. The problem is that lpanal begins to fit (non-existant) formants to the noise at higher orders. The best way is to examine each of the filter responses (you have to use a seperate program) and remove the false formants, but it's very tedious. Another method is to use two different analysis files with different orders (from the same sound) and detune the two generated sounds slightly. You can usually cancel out most of the extraneous formants that way. I often use gbuzz and decrease the high partials a little bit to remove the buzziness too. One last item, and no-one seems to believe me. The manual says (at least the last time I checked) that the transition from voiced to unvoiced speech occurs at a kerr of 0.3. The actual value is 10^-3 = 0.001. You can get some really nice v's and z's if you logarithmically cross-fade from buzz at kerr=0.001 to rand at kerr=0.01. I hope these help. At 2:40 AM -0600 3/11/99, Hans Mikelson wrote: >Hi, > >After battling with chronic laryngitis for the past year I have gotten to >know some speech therapists. I have discussed with them some of the ways in >which they analyze voices. Some of the parameters they like to track are: >Pitch, jitter, shimmer, and for normal speaking: pitch average and >variation. I have been working on implementing some of the analysis >routines in Csound. If someone cares to give me a reference for some >algorithms for this type of processing I would appreciate it. > >I notice that Csound's LPC analysis has a pitch tracker which may be >helpful. > Dr. R. Erik Spjut |
Date | 1999-03-12 03:23 |
From | Charles Starrett |
Subject | ExZAKtly how do you use this... |
Are there any docs or orc/scos which show how ZAK space can be used for mixing? I've put the Csound manual under my pillow for over a month now and I still can't get it figured out... Just wonderin'... -- /----Charles D. Starrett-----\ "I do not feel that | / | ____ | | ____ | | my research suffered unduly | /\ | |-- |-| ___| | | from the fact that I enjoyed it." | |___ |____| | |_____| | *Daniel Miller, \--starrett@fas.harvard.edu--/ Modernity--an Ethnographic Approach |