comparison of different pitch scale possibilities for voices
Date | 2016-04-04 13:23 |
From | Karin Daum |
Subject | comparison of different pitch scale possibilities for voices |
Hi, for the intonation of text being spoken in my actual projects using a pseudo-laguage I need to modify volume, pitch and duration for the individual syllables. The pvstanal opcode is the right choice to vary both, time and pitch at the same time. In my first project I had problems with varying the pitch by the amount I've measured when speaking in real life. I had often the "helium voice effect" when scaling either directly in pvstanal or with pvscale. I could vary the scale factor only by about 15% to get tolerable results while in real life I measured up to ~50%. When using pvswarp the voice sounds un-natural since the formants are kept fixed. Only there relative amplitudes are changed which has the effect that e.g. the German "a" (like in the English "hard") changes to "o" (like in "boat"). I investigate a bit what happens when raising the pitch using praat. See first attachment for the German "a". You can notice two effects for the higher pitch (red): the formants ARE shifted and above about 1000-1500 Hz the spectra agree statistically. This I have implemented by splitting the sound in a LF and a HF part. To the LF part pitch scale factors are applied while the HF part is not modified. To my taste the results sound more natural. (For the splitting pvsbandp would be the right code to use, but this may lead to crashed when used at different times in a performance) To hear the differences between the different methods I've written a small program which is the second attachment. I guess this may be interesting for others too. The 3rd attachment is an input file with some meaningless text spoken, just to have some input. Maybe this can give some hints for others too working with pitch modification of voices cheers, Karin Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here |
Date | 2016-04-04 21:10 |
From | joachim heintz |
Subject | Re: comparison of different pitch scale possibilities for voices |
thanks, karin, this shows up interesting differences. just two remarks: 1) i had to change p4 in the score (calling i 10) to 1.5 instead of 1. 2) did you try the optional arguments in pvscale which try to preserve the formants? best - joachim On 04/04/16 14:23, Karin Daum wrote: > Hi, > > for the intonation of text being spoken in my actual projects using a pseudo-laguage I need to modify volume, pitch and duration for the individual syllables. The pvstanal opcode is the right choice to vary both, time and pitch at the same time. > In my first project I had problems with varying the pitch by the amount I've measured when speaking in real life. I had often the "helium voice effect" when scaling either directly in pvstanal or with pvscale. I could vary the scale factor only by about 15% to get tolerable results while in real life I measured up to ~50%. > > When using pvswarp the voice sounds un-natural since the formants are kept fixed. Only there relative amplitudes are changed which has the effect that e.g. the German "a" (like in the English "hard") changes to "o" (like in "boat"). > > I investigate a bit what happens when raising the pitch using praat. See first attachment for the German "a". You can notice two effects for the higher pitch (red): the formants ARE shifted and above about 1000-1500 Hz the spectra agree statistically. This I have implemented by splitting the sound in a LF and a HF part. To the LF part pitch scale factors are applied while the HF part is not modified. To my taste the results sound more natural. > (For the splitting pvsbandp would be the right code to use, but this may lead to crashed when used at different times in a performance) > > To hear the differences between the different methods I've written a small program which is the second attachment. I guess this may be interesting for others too. > > The 3rd attachment is an input file with some meaningless text spoken, just to have some input. > > Maybe this can give some hints for others too working with pitch modification of voices > > cheers, > > Karin > > Csound mailing list > Csound@listserv.heanet.ie > https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND > Send bugs reports to > https://github.com/csound/csound/issues > Discussions of bugs and features can be posted here > Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here |
Date | 2016-04-05 07:51 |
From | Karin Daum |
Subject | Re: comparison of different pitch scale possibilities for voices |
Hi Joachim, 1) concerning p4=1: this was the last thing I check before posting, that the splitting into 2 parts LF+HF with the two methods (instruments 5 and 7) do give the same result than the original and that there is no problem with the phase shift I posted before. I changed this to a value larger than 1 in CsoundQt before posting, but I just saw that I did not save it. 2) I did not check them, because preserving formants did not make much sense to me since this is not what happens in reality. I expected that they give the same results than pvswarp. But this is not the case. They sound natural and give results very similar to those of what I did in instrument 5 and 7. They shift the fundamental and the lower harmonics and produce for f>1.5 kHz a similar spectrum than observed for the original sound. I find the naming of the parameter as ‘kkeepform’ and the description in the manual misleading. Perhaps this may be improved. I simply misunderstood it. This solves a lot of problems I had: I don’t need to wait until the bug with pvsbandp is fixed or to look for workarounds further. best karin > On 4 Apr 2016, at 22:10, joachim heintz |
Date | 2016-04-05 07:57 |
From | Victor Lazzarini |
Subject | Re: comparison of different pitch scale possibilities for voices |
Maybe you could help us improve the manual? I wrote that page, and what I meant by preserving formants is to keep the original spectral envelope. What did you understand by "preserving formants" that was different to that? Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy Maynooth University Ireland > On 5 Apr 2016, at 07:51, Karin Daum |
Date | 2016-04-05 08:17 |
From | Karin Daum |
Subject | Re: comparison of different pitch scale possibilities for voices |
just as in pvswarp, which keeps the formants fixed and only damp the amplitude of the fundamental with increasing scale factor. This is also the reason why it may change the vowel, because its just the relative amplitudes of the fundamental and the harmonics which makes the difference. Since I had tried pvswarp before I simply did not try these options for pvscale. For very large scale factors (2-3) the method I proposed in my example sounds more natural. Concerning the text I think something like: the formants’ frequencies are scaled by the factor kscale but the amplitudes are adjusted to what is expected at these frequencies from the input signal. Certainly not optimal, but this is what I suppose to happen when looking at the original and the modified spectra for kkeepform=1,2 (which I would simply name ‘kmethod’) cheers, Karin > On 5 Apr 2016, at 08:57, Victor Lazzarini |
Date | 2016-04-05 09:02 |
From | Andrea Crespi <4ndr34cr35p1@GMAIL.COM> |
Subject | Re: comparison of different pitch scale possibilities forvoices |
Hello. I just wanted to say that I have recently used pvscale and I did not find the manual misleading. Mode 1 and 2 do what I expected and the kkeepform parameter makes sense to me. Also, the C code in pvsbasic.c covers all the passages I would expect to see in a pitch scaling operation with formant preservation. Yet, there is a tricky point in which the original amplitudes are “equalised” through the inverse of the spectral envelope before the scaling operation is performed.
Sent from Outlook Mail for Windows 10 phone From: Karin Daum just as in pvswarp, which keeps the formants fixed and only damp the amplitude of the fundamental with increasing scale factor. This is also the reason why it may change the vowel, because its just the relative amplitudes of the fundamental and the harmonics which makes the difference. Since I had tried pvswarp before I simply did not try these options for pvscale. For very large scale factors (2-3) the method I proposed in my example sounds more natural.
Concerning the text I think something like: the formants’ frequencies are scaled by the factor kscale but the amplitudes are adjusted to what is expected at these frequencies from the input signal.
Certainly not optimal, but this is what I suppose to happen when looking at the original and the modified spectra for kkeepform=1,2 (which I would simply name ‘kmethod’)
cheers,
Karin
> On 5 Apr 2016, at 08:57, Victor Lazzarini <Victor.Lazzarini@NUIM.IE> wrote: > > Maybe you could help us improve the manual? I wrote that page, and what I meant by preserving formants is to keep the original spectral envelope. What did you understand by "preserving formants" that was different to that? > > Victor Lazzarini > Dean of Arts, Celtic Studies, and Philosophy > Maynooth University > Ireland > >> On 5 Apr 2016, at 07:51, Karin Daum <karin.daum@DESY.DE> wrote: >> >> Hi Joachim, >> >> 1) concerning p4=1: this was the last thing I check before posting, that the splitting into 2 parts LF+HF with the two methods (instruments 5 and 7) do give the same result than the original and that there is no problem with the phase shift I posted before. I changed this to a value larger than 1 in CsoundQt before posting, but I just saw that I did not save it. >> >> 2) I did not check them, because preserving formants did not make much sense to me since this is not what happens in reality. I expected that they give the same results than pvswarp. But this is not the case. They sound natural and give results very similar to those of what I did in instrument 5 and 7. They shift the fundamental and the lower harmonics and produce for f>1.5 kHz a similar spectrum than observed for the original sound. I find the naming of the parameter as ‘kkeepform’ and the description in the manual misleading. Perhaps this may be improved. I simply misunderstood it. This solves a lot of problems I had: I don’t need to wait until the bug with pvsbandp is fixed or to look for workarounds further. >> >> best >> >> karin >>> On 4 Apr 2016, at 22:10, joachim heintz <jh@JOACHIMHEINTZ.DE> wrote: >>> >>> thanks, karin, this shows up interesting differences. just two remarks: >>> 1) i had to change p4 in the score (calling i 10) to 1.5 instead of 1. >>> 2) did you try the optional arguments in pvscale which try to preserve the formants? >>> best - >>> joachim >>> >>> >>>> On 04/04/16 14:23, Karin Daum wrote: >>>> Hi, >>>> >>>> for the intonation of text being spoken in my actual projects using a pseudo-laguage I need to modify volume, pitch and duration for the individual syllables. The pvstanal opcode is the right choice to vary both, time and pitch at the same time. >>>> In my first project I had problems with varying the pitch by the amount I've measured when speaking in real life. I had often the "helium voice effect" when scaling either directly in pvstanal or with pvscale. I could vary the scale factor only by about 15% to get tolerable results while in real life I measured up to ~50%. >>>> >>>> When using pvswarp the voice sounds un-natural since the formants are kept fixed. Only there relative amplitudes are changed which has the effect that e.g. the German "a" (like in the English "hard") changes to "o" (like in "boat"). >>>> >>>> I investigate a bit what happens when raising the pitch using praat. See first attachment for the German "a". You can notice two effects for the higher pitch (red): the formants ARE shifted and above about 1000-1500 Hz the spectra agree statistically. This I have implemented by splitting the sound in a LF and a HF part. To the LF part pitch scale factors are applied while the HF part is not modified. To my taste the results sound more natural. >>>> (For the splitting pvsbandp would be the right code to use, but this may lead to crashed when used at different times in a performance) >>>> >>>> To hear the differences between the different methods I've written a small program which is the second attachment. I guess this may be interesting for others too. >>>> >>>> The 3rd attachment is an input file with some meaningless text spoken, just to have some input. >>>> >>>> Maybe this can give some hints for others too working with pitch modification of voices >>>> >>>> cheers, >>>> >>>> Karin >>>> >>>> Csound mailing list >>>> Csound@listserv.heanet.ie >>>> https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND >>>> Send bugs reports to >>>> https://github.com/csound/csound/issues >>>> Discussions of bugs and features can be posted here >>> >>> Csound mailing list >>> Csound@listserv.heanet.ie >>> https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND >>> Send bugs reports to >>> https://github.com/csound/csound/issues >>> Discussions of bugs and features can be posted here >> >> Csound mailing list >> Csound@listserv.heanet.ie >> https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND >> Send bugs reports to >> https://github.com/csound/csound/issues >> Discussions of bugs and features can be posted here > > Csound mailing list > Csound@listserv.heanet.ie > https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND > Send bugs reports to > https://github.com/csound/csound/issues > Discussions of bugs and features can be posted here
Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here
|
Date | 2016-04-05 12:05 |
From | Victor Lazzarini |
Subject | Re: comparison of different pitch scale possibilities forvoices |
That equalisation is crucial, otherwise when the preserved spectral envelope is applied, it will be distorted by the scaled one. But I agree that part is not completely intuitive. ======================== Dr Victor Lazzarini Dean of Arts, Celtic Studies and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 5 Apr 2016, at 09:02, Andrea Crespi <4ndr34cr35p1@GMAIL.COM> wrote: > > Hello. I just wanted to say that I have recently used pvscale and I did not find the manual misleading. Mode 1 and 2 do what I expected and the kkeepform parameter makes sense to me. > Also, the C code in pvsbasic.c covers all the passages I would expect to see in a pitch scaling operation with formant preservation. Yet, there is a tricky point in which the original amplitudes are “equalised” through the inverse of the spectral envelope before the scaling operation is performed. > > Sent from Outlook Mail for Windows 10 phone > > From: Karin Daum > Sent: Tuesday 5 April 2016 08:17 > To: CSOUND@LISTSERV.HEANET.IE > Subject: Re: [Csnd] comparison of different pitch scale possibilities forvoices > > just as in pvswarp, which keeps the formants fixed and only damp the amplitude of the fundamental with increasing scale factor. This is also the reason why it may change the vowel, because its just the relative amplitudes of the fundamental and the harmonics which makes the difference. Since I had tried pvswarp before I simply did not try these options for pvscale. > For very large scale factors (2-3) the method I proposed in my example sounds more natural. > > Concerning the text I think something like: the formants’ frequencies are scaled by the factor kscale but the amplitudes are adjusted to what is expected at these frequencies from the input signal. > > Certainly not optimal, but this is what I suppose to happen when looking at the original and the modified spectra for kkeepform=1,2 (which I would simply name ‘kmethod’) > > cheers, > > Karin > > On 5 Apr 2016, at 08:57, Victor Lazzarini |
Date | 2016-04-05 12:44 |
From | Andrea Crespi <4ndr34cr35p1@GMAIL.COM> |
Subject | Re: comparison of different pitch scale possibilities forvoices |
Yes, I mentioned this passage because I found it tricky the first time I saw it and I thought that maybe this is what Karin was not expecting from pvscale. It came up to me when I read that Karin notices that the output of pvscale has a similar spectrum to the one of the original sound for f>1.5 kHz. I am not sure these things are related, though. 2016-04-05 12:05 GMT+01:00 Victor Lazzarini <Victor.Lazzarini@nuim.ie>: That equalisation is crucial, otherwise when the preserved spectral envelope is applied, it will be distorted by |