Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code

Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code

Date	2005-09-22 15:17
From	Michael Gogins
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	How efficient is the Whittle algorithm compared to the Mersenne Twister? And, how random is it? The MT has become the standard high-performance pseudo-random noise generator in many software packages. It is the default PRNG in boost, for example. Regards, Mike -----Original Message----- From: Istvan Varga Sent: Sep 22, 2005 4:48 AM To: csound@lists.bath.ac.uk Subject: Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code Robin Whittle wrote: > I haven't been doing anything with Csound in recent years, but I lurk on > this list. Nearly 10 years ago I contributed a 31 bit pseudo-random > number generator to Csound for noise etc. As far as I know, it is still > there. Do you mean the x = (x * 16807) % 0x7FFFFFFF algorithm ? It is actually used in some newer opcodes as well. In fact, I think using rand() is a bad idea for a few reasons: it may be 16 bit on some systems, it may be 32 bit or more but slow on others, different implementations generate different pseudo-random sequences with the same seed, and it is not thread safe (even though Csound 5 would require that). -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-22 16:43
From	jlato@mail.utexas.edu
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	It is not as random as the Mersenne Twister. The period is much smaller, and there is serial correlation between successive outputs. There are some other flaws as well. The current algorithm should be faster, though, assuming it's implemented properly. Although I personally would advocate a move to the MT, I wouldn't consider it a high priority. I don't think that super high-quality randomness is necessary for audio work; specifically I think people are much more likely to impose patterns on the audio signal than to actually hear them (assuming the 31-bit PRNG; the period of a 16-bit PSRG is clearly audible), and I don't know that anyone is doing any algorithms that depend on good randomness. Also, the smaller footprint and faster speed of the current algo. is nice for people with less optimal computers, perhaps allowing more real-time DSP than otherwise possible. john lato Quoting Michael Gogins : > How efficient is the Whittle algorithm compared to the Mersenne Twister? And, > how random is it? The MT has become the standard high-performance > pseudo-random noise generator in many software packages. It is the default > PRNG in boost, for example. > > Regards, > Mike > > -----Original Message----- > From: Istvan Varga > Sent: Sep 22, 2005 4:48 AM > To: csound@lists.bath.ac.uk > Subject: Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC > assembly code > > Robin Whittle wrote: > > > I haven't been doing anything with Csound in recent years, but I lurk on > > this list. Nearly 10 years ago I contributed a 31 bit pseudo-random > > number generator to Csound for noise etc. As far as I know, it is still > > there. > > Do you mean the x = (x * 16807) % 0x7FFFFFFF algorithm ? It is > actually used in some newer opcodes as well. In fact, I think using > rand() is a bad idea for a few reasons: it may be 16 bit on some systems, > it may be 32 bit or more but slow on others, different implementations > generate different pseudo-random sequences with the same seed, and > it is not thread safe (even though Csound 5 would require that). > -- > Send bugs reports to this list. > To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk > > > > -- > Send bugs reports to this list. > To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk > -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-22 16:50
From	jlato@mail.utexas.edu
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Argh. I fell victim to the same error as Michel, and I said in my post that it's 31-bit. Thanks, Istvan. Anyway, if it were mod(2^32), it would definately be faster than the MT. As mod(2^31), I'm not sure that's true. I might have a chance to run a comparison later this week. jwl Quoting jlato@mail.utexas.edu: > It is not as random as the Mersenne Twister. The period is much smaller, > and > there is serial correlation between successive outputs. There are some > other > flaws as well. The current algorithm should be faster, though, assuming > it's > implemented properly. > > Although I personally would advocate a move to the MT, I wouldn't consider it > a > high priority. I don't think that super high-quality randomness is > necessary > for audio work; specifically I think people are much more likely to impose > patterns on the audio signal than to actually hear them (assuming the 31-bit > PRNG; the period of a 16-bit PSRG is clearly audible), and I don't know that > anyone is doing any algorithms that depend on good randomness. Also, the > smaller footprint and faster speed of the current algo. is nice for people > with > less optimal computers, perhaps allowing more real-time DSP than otherwise > possible. > > john lato > > Quoting Michael Gogins : > > > How efficient is the Whittle algorithm compared to the Mersenne Twister? > And, > > how random is it? The MT has become the standard high-performance > > pseudo-random noise generator in many software packages. It is the default > > PRNG in boost, for example. > > > > Regards, > > Mike > > > > -----Original Message----- > > From: Istvan Varga > > Sent: Sep 22, 2005 4:48 AM > > To: csound@lists.bath.ac.uk > > Subject: Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC > > assembly code > > > > Robin Whittle wrote: > > > > > I haven't been doing anything with Csound in recent years, but I lurk on > > > this list. Nearly 10 years ago I contributed a 31 bit pseudo-random > > > number generator to Csound for noise etc. As far as I know, it is still > > > there. > > > > Do you mean the x = (x * 16807) % 0x7FFFFFFF algorithm ? It is > > actually used in some newer opcodes as well. In fact, I think using > > rand() is a bad idea for a few reasons: it may be 16 bit on some systems, > > it may be 32 bit or more but slow on others, different implementations > > generate different pseudo-random sequences with the same seed, and > > it is not thread safe (even though Csound 5 would require that). > > -- > > Send bugs reports to this list. > > To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk > > > > > > > > -- > > Send bugs reports to this list. > > To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk > > > > > > -- > Send bugs reports to this list. > To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk > -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-23 10:47
From	Robin Whittle
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Michael Gogins wrote: > How efficient is the Whittle algorithm compared to the Mersenne Twister? Its not my algorithm - its David G. Carta's. The properly commented code is here: http://www.firstpr.com.au/dsp/rand31/rand31-park-miller-carta-int.c.txt and the active lines are: long unsigned int hi, lo; lo = constapmc * (seed31pmc & 0xFFFF); hi = constapmc * (seed31pmc >> 16); lo += (hi & 0x7FFF) << 16; lo += hi >> 15; if (lo > 0x7FFFFFFF) lo -= 0x7FFFFFFF; return ( seed31pmc = (long)lo ); The "oscbnk_rand31(long seed)" code posted by Istvan is more complex than my code and probably somewhat slower, because it involves a 64 bit variable, and I think a 32 x 32 = 64 bit multiply. Carta's algorithm uses 16 x 16 = 32 bit multiplies and 32 bit variables. I don't know how fast the Mersenne Twister is. Probably, for Csound, where there is no shortage of RAM, the Mersenne Twister would be a good choice. Its easier to use a known good (to all contemporary critics) solution rather than waste human time in debate about the quality of a lesser PRNG, just to save a few CPU cycles. On the other hand, I think that there is no audio problem with the Park-Miller-Carta algorithm in terms of using it for noise, or in terms of its cycle length. For some probabilistic purposes there might be subtle problems, but it is hard for me to imagine how this would occur. Please consider: http://www.firstpr.com.au/dsp/rand31/#Quality and try to imagine what problems may be caused. Just because we can't imagine problems doesn't mean there won't be any, but I would want to see some pretty concrete arguments before adopting a slower algorithm, since speed is so important for real-time, and for increasing the ability to work on large pieces without excessively old waiting for the file to render. My main interest with the Park-Miller-Carta algorithm was embedded microcontrollers where CPU cycles means battery drain. - Robin -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-23 13:08
From	Istvan Varga
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Robin Whittle wrote: > The "oscbnk_rand31(long seed)" code posted by Istvan is more complex > than my code and probably somewhat slower, because it involves a 64 bit > variable, and I think a 32 x 32 = 64 bit multiply. Carta's algorithm > uses 16 x 16 = 32 bit multiplies and 32 bit variables. Actually, it is faster, at least on x86 with GCC, because the compiler can take advantage of the fact that 32 bit multiplication always produces a 64 bit result (storing the most significant bits in edx). So, the function below from the current CVS sources static CS_PURE long oscbnk_rand31(long seed) { uint64_t tmp1; uint32_t tmp2; /* x = (16807 * x) % 0x7FFFFFFF / tmp1 = (uint64_t) ((int32_t) seed (int64_t) 16807); tmp2 = (uint32_t) tmp1 & (uint32_t) 0x7FFFFFFF; tmp2 += (uint32_t) (tmp1 >> 31); if ((int32_t) tmp2 < (int32_t) 0) tmp2 = (tmp2 + (uint32_t) 1) & (uint32_t) 0x7FFFFFFF; return (long) tmp2; } compiles to (GCC 4.1 beta, -O2 -march=pentium3 -mfpmath=387 -mno-sse; better code would be generated with -fomit-frame-pointer, eliminating the lines marked with FP): 00000000 : 0: b9 a7 41 00 00 mov ecx,0x41a7 5: f7 e9 imul ecx 7: 55 push ebp ; FP 8: 89 e5 mov ebp,esp ; FP a: 89 c1 mov ecx,eax c: 81 e1 ff ff ff 7f and ecx,0x7fffffff 12: 0f ac d0 1f shrd eax,edx,0x1f 16: c1 ea 1f shr edx,0x1f ; junk ??? 19: 01 c8 add eax,ecx 1b: 78 03 js 20 1d: 5d pop ebp ; FP 1e: c3 ret 1f: 90 nop 20: 5d pop ebp ; FP 21: 40 inc eax 22: 25 ff ff ff 7f and eax,0x7fffffff 27: c3 ret (note: for some reason, if the tmp2 = (tmp2 + 1) & 0x7FFFFFFF is replaced with tmp2 -= 0x7FFFFFFF which would do the same, the compiler generates much worse code, but only if the use of cmov instructions (-march=i686 and above) is enabled) Now, here is a slightly modified (to make it easier to compare with the above function) version of the code you posted: #define constapmc 16807 static __attribute__((__pure__, __noinline__)) long rand31(long seed31pmc) { long unsigned int hi, lo; lo = constapmc * (seed31pmc & 0xFFFF); hi = constapmc * (seed31pmc >> 16); lo += (hi & 0x7FFF) << 16; lo += hi >> 15; if (lo > 0x7FFFFFFF) lo -= 0x7FFFFFFF; return (long) lo; } this compiles to (using the same compiler and flags): 00000000 : 0: 0f b7 c8 movzx ecx,ax 3: c1 f8 10 sar eax,0x10 6: 69 c0 a7 41 00 00 imul eax,eax,0x41a7 c: 69 c9 a7 41 00 00 imul ecx,ecx,0x41a7 12: 55 push ebp ; FP 13: 89 e5 mov ebp,esp ; FP 15: 89 c2 mov edx,eax 17: 25 ff 7f 00 00 and eax,0x7fff 1c: c1 ea 0f shr edx,0xf 1f: 01 d1 add ecx,edx 21: c1 e0 10 shl eax,0x10 24: 01 c8 add eax,ecx 26: 5d pop ebp ; FP 27: 83 f8 ff cmp eax,0xffffffff 2a: 8d 90 01 00 00 80 lea edx,[eax-2147483647] 30: 0f 4e c2 cmovle eax,edx 33: c3 ret Now there are two multiplications instead of just one. However, it does have the advantage of depending less on the compiler and CPU type; while versions of GCC I tested (3.3.4, 4.0.0, 4.1 beta) can all optimize 32 to 64 bit multiplies and produce similar assembly code to the above, the Intel compiler is apparently not as good this time and generates a full 64 bit multiply. -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-23 13:44
From	Robin Whittle
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Hi Istvan, I have never tried looking at the assembly code generated from C-code like this. I imagine a 32 x 32 multiply takes more CPU cycles than a 16 x 16 - maybe twice as many? It would depend a great deal on the CPU, I expect. I understand it can be hard to test the performance of code, due to cache issues for code and data and probably other things I can't think of now. You wrote: > (note: for some reason, if the tmp2 = (tmp2 + 1) & 0x7FFFFFFF is replaced > with tmp2 -= 0x7FFFFFFF which would do the same, the compiler generates > much worse code, but only if the use of cmov instructions (-march=i686 and > above) is enabled) Hmmm . . . I think it is debatable whether code should be made more complex to generate faster code on just some CPUs with some compilers and some options set for those compilers - but I don't feel like getting into such a debate! - Robin -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-23 14:11
From	Istvan Varga
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Robin Whittle wrote: > I have never tried looking at the assembly code generated from C-code > like this. I imagine a 32 x 32 multiply takes more CPU cycles than a 16 > x 16 - maybe twice as many? It would depend a great deal on the CPU, I > expect. I understand it can be hard to test the performance of code, > due to cache issues for code and data and probably other things I can't > think of now. No, your code compiles to 32 bit multiplies even if I add explicit casts to 16 bit types (perhaps because a 32 bit multiplication is actually faster than a 16 bit one on a modern CPU running in 32 bit mode ?). > Hmmm . . . I think it is debatable whether code should be made more > complex to generate faster code on just some CPUs with some compilers > and some options set for those compilers - but I don't feel like getting > into such a debate! It is not more complex (in fact it has less statements), and only looks complicated because of the many (and perhaps often redundant) explicit casts. The only bit that is made more complex to work around inefficient code generation by GCC is the use of "add one, then AND by 0x7FFFFFFF" instead of just "subtract 0x7FFFFFFF". However, on the other hand, more complexity is removed by not implementing workarounds for legacy compilers and CPUs by software emulation of multiplying 32 bit numbers to get a 64 bit result. It is left up to the compiler to solve this in whatever way is the most efficient, but the problem is that the compiler is not always very smart. Nevertheless, I can replace the random generator with your code to be on the safe side, and use the original only if __GNUC__ and __i386__ are defined (but then, for that single platform and compiler, I may just as well use inline assembly). -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-28 00:11
From	Istvan Varga
Subject	Re: [Csnd] 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Here is a comparison of the time it takes to generate 100,000,000 random numbers with three different generators, compiled with various compilers on Linux. rand31: linear congruential generator (m = 0x7FFFFFFF, a = 742938285) rand: ANSI C rand() (note that it is much slower if not compiled with -static) MT: Mersenne Twister (MT19937) ------------------------------------------------------------------------ GCC 4.1 beta (-O2 -march=pentium3 -static) rand31: 1.703, rand: 4.559, MT: 2.501 GCC 4.1 beta (-O2 -march=pentium3 -fomit-frame-pointer -static) rand31: 1.731, rand: 4.561, MT: 2.321 GCC 4.1 beta (-O2 -march=pentium3) rand31: 1.702, rand: 7.004, MT: 2.500 GCC 4.0.0 (-O2 -march=pentium3 -static) rand31: 1.702, rand: 4.561, MT: 2.405 GCC 4.0.0 (-O2 -march=pentium3 -fomit-frame-pointer -static) rand31: 1.732, rand: 4.562, MT: 2.339 GCC 3.3.4 (-O2 -march=pentium3 -static) rand31: 1.863, rand: 4.172, MT: 2.579 GCC 3.3.4 (-O2 -march=pentium3 -fomit-frame-pointer -static) rand31: 1.730, rand: 4.172, MT: 2.385 ICC 9.0 (-O2 -xK -static) rand31: 1.841, rand: 4.410, MT: 2.474 ------------------------------------------------------------------------ #include #include #include #include #define CS_PURE __attribute__ ((__pure__)) #define CS_NOINLINE __attribute__ ((__noinline__)) #define PUBLIC typedef struct CsoundRandMTState_ { int mti; uint32_t mt[624]; } CsoundRandMTState; PUBLIC CS_PURE int csoundRand31(int seedVal) { uint64_t tmp1; uint32_t tmp2; /* x = (742938285 * x) % 0x7FFFFFFF / tmp1 = (uint64_t) ((int32_t) seedVal (int64_t) 742938285); tmp2 = (uint32_t) tmp1 & (uint32_t) 0x7FFFFFFF; tmp2 += (uint32_t) (tmp1 >> 31); if ((int32_t) tmp2 < (int32_t) 0) tmp2 = (tmp2 + (uint32_t) 1) & (uint32_t) 0x7FFFFFFF; return (int) tmp2; } /* Period parameters / #define N 624 #define M 397 #define MATRIX_A 0x9908B0DFU / constant vector a / #define UPPER_MASK 0x80000000U / most significant w-r bits / #define LOWER_MASK 0x7FFFFFFFU / least significant r bits / / initializes mt[N] with a seed / PUBLIC void csoundSeedRandMT(CsoundRandMTState p, uint32_t seedVal) { int i; uint32_t x; p->mt[0] = x = seedVal; for (i = 1; i < N; i++) { /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. / / In the previous versions, MSBs of the seed affect / / only MSBs of the array mt[]. / / 2002/01/09 modified by Makoto Matsumoto / x = ((uint32_t) 1812433253 (x ^ (x >> 30)) + (uint32_t) i); p->mt[i] = x; } p->mti = N; } static CS_NOINLINE void MT_update_state(CsoundRandMTState p) { / mag01[x] = x * MATRIX_A for x=0,1 / const uint32_t mag01[2] = { (uint32_t) 0, (uint32_t) MATRIX_A }; int i; uint32_t y; for (i = 0; i < (N - M); i++) { y = (p->mt[i] & UPPER_MASK) \| (p->mt[i + 1] & LOWER_MASK); p->mt[i] = p->mt[i + M] ^ (y >> 1) ^ mag01[y & (uint32_t) 1]; } for ( ; i < (N - 1); i++) { y = (p->mt[i] & UPPER_MASK) \| (p->mt[i + 1] & LOWER_MASK); p->mt[i] = p->mt[i + (M - N)] ^ (y >> 1) ^ mag01[y & (uint32_t) 1]; } y = (p->mt[N - 1] & UPPER_MASK) \| (p->mt[0] & LOWER_MASK); p->mt[N - 1] = p->mt[M - 1] ^ (y >> 1) ^ mag01[y & (uint32_t) 1]; } / generates a random number on [0,0xffffffff]-interval / PUBLIC uint32_t csoundRandMT(CsoundRandMTState p) { int i = p->mti; uint32_t y; if (i >= N) { /* generate N words at one time / MT_update_state(p); i = 0; } y = p->mt[i]; p->mti = i + 1; / Tempering / y ^= (y >> 11); y ^= (y << 7) & (uint32_t) 0x9D2C5680U; y ^= (y << 15) & (uint32_t) 0xEFC60000U; y ^= (y >> 18); return y; } volatile uint32_t seed_; int main(void) { CsoundRandMTState st; double t0, t1, t2, t3; struct timeval tv; int i; gettimeofday(&tv, NULL); t0 = (double) tv.tv_usec 0.000001 + (double) tv.tv_sec; seed_ = 12345; for (i = 0; i < 100000000; i++) seed_ = csoundRand31(seed_); gettimeofday(&tv, NULL); t1 = (double) tv.tv_usec * 0.000001 + (double) tv.tv_sec; srand(314159); for (i = 0; i < 100000000; i++) seed_ = rand(); gettimeofday(&tv, NULL); t2 = (double) tv.tv_usec * 0.000001 + (double) tv.tv_sec; csoundSeedRandMT(&st, 5489); for (i = 0; i < 100000000; i++) { seed_ = csoundRandMT(&st); /* if (i < 500) { printf("%10u ", seed_); if ((i % 5) == 4) printf("\n"); } / } gettimeofday(&tv, NULL); t3 = (double) tv.tv_usec 0.000001 + (double) tv.tv_sec; printf("rand31: %.3f, rand: %.3f, MT: %.3f\n", (t1 - t0), (t2 - t1), (t3 - t2)); return 0; } -- Send bugs reports to this list. To unsubscribe, send email to csound-unsubscribe@lists.bath.ac.uk

Date	2005-09-29 03:06
From	Robin Whittle
Subject	Re: 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Thanks Istvan for these timing results. What CPU and clock rate were you using? Andrew Simper, on the Music-DSP mailing list ( http://shoko.calarts.edu/musicdsp ), suggested an exceedingly nifty optimisation to the mod(0x7FFFFFFF) for my code: > Change > if (lo > 0x7FFFFFFF) lo -= 0x7FFFFFFF; > to: > lo = (lo & 0x7FFFFFFF) + (lo >> 31); I think this could also be applied to the 32 x 32 = 64 bit multiplication approach you use in csoundRand31(). I think it is highly likely that an add of two math operations which can be done in parallel will be faster than any kind of test and conditional branch. This should be applicable to my dsPIC assembly code too. Andrew Simper also plans to consider using XOR on the output of the basic PRNG to minimise any value correlations. I wrote the following about XORing: On XORing, I imagine that choosing a suitable fixed 31 bit value would do a pretty good job of messing up whatever patterns might exist in the value dimension of the output stream. (Shuffling the order of the whole output words would do a good job of messing up the time dimension, which would also reduce or eliminate short-term value dimension correlations.) To choose a fixed XOR value, maybe we could choose a binary representation of "pi" or "e", or scoop out some bits from a .wav file of Jimi Hendrix playing Purple Haze. Any such XOR factor would ensure that the resulting output stream included one instance of 0x7FFFFFFF and one instance of 0, whilst removing instances of the XOR factor and its complement. I was thinking of something more fiendish, but which would be more expensive than a fixed XOR value. Suppose we maintained in RAM a 32 bit XOR factor, we could make it a left to right shift-register, feeding it bit 0 of the raw PRNG result (or bit 31 of the intermediate result before the mod() operation). This changing XOR value would surely break up any value patterns, but it would introduce a number of other "features" which may or may not be desirable. Alternatively, the XOR factor could be a counter, incrementing every cycle, or being incremented (or added to by some factor) only when particular bits were set in the raw PRNG result. That would probably extend the cycle length by some significant factor. It seems the Mersenne Twister is not at all slow, so in many situations there's no point trying to do anything too complex. However, the Mersenne Twister: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html requires 624 x 32 bit words of RAM, which rules out its use in some embedded applications. - Robin

Date	2005-09-29 12:40
From	Istvan Varga
Subject	Re: 31 bit pseudo-random number gen in C, C++ & dsPIC assembly code
	Robin Whittle wrote: > Thanks Istvan for these timing results. What CPU and clock rate were > you using? A rather old Pentium III (Celeron) 1300 MHz. >>Change >> if (lo > 0x7FFFFFFF) lo -= 0x7FFFFFFF; >>to: >> lo = (lo & 0x7FFFFFFF) + (lo >> 31); Thanks for this tip, it seems to work well, so I will use it. > I think this could also be applied to the 32 x 32 = 64 bit > multiplication approach you use in csoundRand31(). Yes, as far as the trick to avoid using % is concerned, it is identical to the code you posted. The only difference is in the multiplication; by splitting the multiplication in the suggested way, the performance depends less on the compiler and CPU type. However, when 32 to 64 bit multiply is indeed available, using that is faster, and it also removes the limitation of having to use small multipliers. Trying to compile with the optimization to remove the conditional, and using a large multiplier: /* simple linear congruential generator / PUBLIC int csoundRand31(int seedVal) { uint64_t tmp1; uint32_t tmp2; /* x = (742938285 * x) % 0x7FFFFFFF / tmp1 = (uint64_t) ((int32_t) (seedVal) * (int64_t) 742938285); tmp2 = (uint32_t) tmp1 & (uint32_t) 0x7FFFFFFF; tmp2 += (uint32_t) (tmp1 >> 31); tmp2 = (tmp2 & (uint32_t) 0x7FFFFFFF) + (tmp2 >> 31); (*seedVal) = (int) tmp2; return (int) tmp2; } I get this with GCC 4.0 and 4.1: 00000000 : 0: 55 push ebp 1: b8 ad 56 48 2c mov eax,0x2c4856ad 6: 89 e5 mov ebp,esp 8: 53 push ebx 9: 8b 5d 08 mov ebx,DWORD PTR [ebp+8] c: f7 2b imul DWORD PTR [ebx] e: 89 c1 mov ecx,eax 10: 81 e1 ff ff ff 7f and ecx,0x7fffffff 16: 0f ac d0 1f shrd eax,edx,0x1f 1a: 01 c1 add ecx,eax 1c: 89 c8 mov eax,ecx 1e: 25 ff ff ff 7f and eax,0x7fffffff 23: c1 e9 1f shr ecx,0x1f 26: 01 c8 add eax,ecx 28: 89 03 mov DWORD PTR [ebx],eax 2a: c1 ea 1f shr edx,0x1f 2d: 5b pop ebx 2e: 5d pop ebp 2f: c3 ret So, even with a frame pointer, and taking a pointer to the seed instead of the seed itself (this makes the function slightly more complex, but easier to use), the code fits into 48 bytes. Using the previously posted benchmark program, I get about the same times when compiled with frame pointer, but the new version is apparently faster when -fomit-frame-pointer is used. > Andrew Simper also plans to consider using XOR on the output of the > basic PRNG to minimise any value correlations. I wrote the following > about XORing: Does using a large multiplier have an effect on this issue ? > It seems the Mersenne Twister is not at all slow, so in many situations > there's no point trying to do anything too complex. However, the > Mersenne Twister: > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html > > requires 624 x 32 bit words of RAM, which rules out its use in some > embedded applications. The average CPU usage is indeed quite good (I found it to be better than the rand() of glibc on Linux, which is not bad given that rand() is also not of particularly high quality), the problem is the memory usage, and that some calls are much slower than the average, because on every 624th call the whole table needs to be updated. So, this generator may be good for use as a single global PRNG, but having a separate instance per random opcode is likely to be too expensive. I think I will use the Mersenne Twister in the "x-class" random opcodes (unirand, trirand, gauss, etc.) and GEN21 which is based on the same code, using a single shared generator for an instance of Csound, and use a 31 bit LCG in other opcodes where rand() is to be replaced.