Csound Csound-dev Csound-tekno Search About

[Cs-dev] Multi-threading spinlocks

Date2008-08-10 19:03
FromMichael Gogins
Subject[Cs-dev] Multi-threading spinlocks
I have committed to Csound CVS implementations of spinlocks in csound.h, based on compiler intrinsics, for MSVC and GCC. The spinlock macros compile to no-ops if the compiler does not have the required intrinsics. Note that GCC 4.1 and later has these intrinsics, but I think GCC 3.4 does not have them.

(Note: a compiler intrinsic is a function for which code is emitted directly by the compiler, instead of being linked in from a library. Current compilers use intrinsics for atomic operations and sometimes for elementary math functions, bit twiddling, memory operations, and other things.)

I have used these spinlocks to protect the spin and spout buffers in all in and out opcodes. I have not done anything for other shared global data, although the channel, mixer, and table opcodes would be obvious next steps.

Steven Yi raised the possibility of using private spout buffers for out opcode instances, to be reduced after multiple opcodes have been mapped to different threads. However, I decided to go ahead and commit spinlocks for the following reasons:

(1) Spinlocks have proved useful and scalable in scenarios where the functions that are protected typically execute for only a short time. That is exactly the case here. Typically, the out opcodes add a few to few hundred samples, and are done. This is usually a rather small part of the total overhead for an instrument instance. 

(2) Because of this small amount of time required for out opcodes compared to other computation in the thread, the likelihood of actually having to wait in a spinlock is also rather small. This is verified by the almost complete lack of clicks or pops in the multi-thread testing that we have done to date, without any protection for spout.

(3) By contrast, private spout buffers would involve adding spout once for each thread. I do not know the exact overhead of the atomic compare and swap intrinsic used in the spinlock, but I think it is probably comparable to the cost of adding a few sample frames. More information on this question would be appreciated! But if I am right, then as soon as ksmps is much greater than 1, spinlocks are more efficient than private spouts.

(4) My tests show no appreciable performance impact from spinlocks with 1 or 2 threads.

(5) As the number of cores increases, the value of spinlocks increases even faster, because they involve no operating system locks at all. The only OS overhead is memory scheduling, which would also occur with OS locks.

If anyone has compile-time, run-time, or engineering problems with the spinlocks, please let us know.

If anyone has an alternative implementation of thread-safety for the out and in opcodes, please benchmark it against the spinlock implementation before committing it.

Regards,
Mike



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net