Re: 3D Sound Opcode?

Date	1997-04-16 05:11
From	Robin Whittle
Subject	Re: 3D Sound Opcode?
	Hello Hans and Csounders interested in binaural sound, Your email pretty much describes a ugen I wrote around a year ago. I have not released it publicly since I don't consider it finished, and I have not known of anyone who was really interested in exploring this field. However you do sound interested and I can email you the source code. Are you set up for re-compiling Csound and integrating new ugens? What system are you using? I am using Linux a little, but mainly still using DJGPP for an MSDOS executable which runs under MSDOS, Win3.1 and Win95. There are two ugens. One establishes some global variables and moves the head at k rate. Head position is in x, y and z - distance measured in milliseconds of sound travel, around one foot or 30 cm. The head axis is assumed to be vertical, but it would not be out of the question to extend this to other forms of rotation. The rotation of the head is also specified at k rate. This establishes a global pair of mix variables to which the second ugen accumulates a binaural mix. This ugen also sets up a global variable to determine the length of the delay lines used by the second ugen. Another global variable determines the finesse to which the binaural processing is done. In high quality mode, filtering, volume, proximity effects (low frequency response drops off when small sound sources are not close to the ear) and a double a rate over-sampled interpolated delay line are used. Other values reduce quality, but enable faster cooking of the piece for draft purposes. There are no HRTF samples - that would be slow and raise many problems with interpolating between the samples for other angles. The second ugen is given an audio signal, the "size" of the sound (eg a 1" tweeter, a 15" woofer or something larger, a factor to control to what degree this "size" affects frequency response, and x, y and z locations at k rate. There is no "direction" for the sound source - that would be possible but would add a lot of processing. The sound source is assumed to have spherical radiation. The gain when the sound source is at the ear location is 1.0. Further away the gain drops off in a rough relation to the distance and the "size" of the sound source. A simple 1/(distance squared) approach leads to infinite volumes when the source is at the ear - this would only be realistic for a sound source of infinitely small size. All processing is done on the basis of finely calculated delay times (hence full Doppler effects are a natural occurrence), volume, frequency response (IIR) in terms of proximity and "size" and some crude but effective IIR filtering based on azimuth and a little on elevation. Thus it is quite clear whether the sound is going around the head clockwise or counter-clockwise. I wouldn't make such claims for up/down response - that is fairly weak in the human brain anyway, and would be very complex to research and implement. One day perhaps. The left to right response is excellent. The front to back response is valuable, but probably not as good as with HRTF or a more sophisticated approach to filtering. Overall it produces a very aesthetic sound space, with a few anomalies as the sound passes very close or between the ears. It sounds dramatically real with headphones and quite spacious on speakers. With ksmps = 3 and a rate = 44100, it can do smooth localisation of rapidly moving sources - and you can easily and safely do things moving at a moderate fraction of the speed of sound, in 20 metre radius orbits, whizzing just in front of your nose! Although the elevation cues are limited, the z dimension (up/down) is very important since it enables a sound to pass above or below the head at a distance which does not make it excessively loud or close. The system is set up for one head and any number of sound sources, however you need to set your distance limit carefully and make sure you can fit all the delay lines in RAM. I can send you the source, or an MSDOS binary, and some orc/sco pieces which use these ugens, on condition that you keep them to yourself. At some stage I will probably refine and release them. If you want to work on the source code and if you like to comment and document your code, then it would be great for you to contribute to the ugen. The .h and .c files for UGRW3 total 184 k bytes. They include extensive comments and a sample rate conversion ugen as well. (This enables Csound to calculate sound at 88.2kHz, but write a 44.1 kHz stereo output file.) Even if you are not interested in how I achieve this binaural processing, I would suggest that the following parameters are a valuable basis for any binaural system: Head - k rate x, y and z location - k rate rotation - Perhaps other things if the head axis is not vertical. Source - "Size" of sound - To what degree "size" effects low frequency response: 0 = none, like a sealed speaker box. 1 = a lot, like a loudspeaker without an enclosure. - a rate audio signal - k rate x, y and z General - some parameter to globally control the CPU speed vs quality tradeoff. You can grow old waiting for lots of high qualtity binaural sound to be calculated. Regards - Robin . Robin Whittle . . http://www.ozemail.com.au/~firstpr firstpr@ozemail.com.au . . 11 Miller St. Heidelberg Heights 3081 Melbourne Australia . . Ph +61-3-9459-2889 Fax +61-3-9458-1736 . . Consumer advocacy in telecommunications, especially privacy . . . . First Principles - Research and expression - music, . . music industry, telecommunications . . human factors in technology adoption. . . . Real World Interfaces - Hardware and software, especially . . for music .