| I have reviewed best practices for OpenMP programming. They also apply
to other frameworks for concurrent programming:
Check to see if GCC has a multi-threaded allocator, if it has one use it.
Check to see if GCC supports processor affinity, if it does use it.
Test static vs. dynamic scheduling in parallel regions. I think static
scheduling should be better.
Test active (spinning) vs. passive (sleeping) thread waiting, i.e. at
the end of the instrument layer parallel region. I think active should
be better.
We probably have an issue with false sharing in output opcodes,
busses, etc. False sharing occurs when updates from different threads
within same cache line, even if to different locations within that
line, invalidate the line and cause a reload. On Intel Core processors
the cache line size is 64 bytes, that is 16 float mono sample frames
or 4 double stereo sample frames. False sharing is common and hard to
deal with. One possible solution is to to pad the opcode buffers so
each element is sure to span an entire cache line, i.e. each sample in
the buffer array is actually 128 bytes in size. Then the driver
routines can collect data from the padded buffers. But it would be
nice to know if we really do have this problem before trying to solve
it. I am searching for ways to find out.
In C++ a buffer to avoid false sharing could be:
union FSSample
{
char padding[128];
MYFLT sample;
};
std::vector audioBuffer;
audioBuffer.resize(ksmps * nchnls);
for (size_t i = 0, n = ksmps * nchnls) {
driverBuffer[i] = audioBuffer[i].sample;
}
--
Michael Gogins
Irreducible Productions
http://www.michael-gogins.com
Michael dot Gogins at gmail dot com
------------------------------------------------------------------------------
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net |