Csound Csound-dev Csound-tekno Search About

[Csnd] Multi-threading multiple Csound instances with the API

Date2021-04-03 02:51
FromRichard Knight
Subject[Csnd] Multi-threading multiple Csound instances with the API
Hi

As far as I understand and have tested/heard, using globals and channels 
can undermine any multi-threading when using -j (as opposed to having 
self-isolated instruments).

Ideally I'd like to use multi-threading, but also globals/channels, so I 
thought an interesting way to get around this could be to use the API to 
run multiple instances of Csound and try to handle threading a bit more 
manually. The more I looked into this, the more it seemed like it might 
create other problems though.

The general idea to test is to have one Csound instance running on the 
main thread which uses -odac and then two other threads using -n which 
would just send to/receive from the instance on the main thread, and the 
API brokering audio between them with channels.
In each thread it roughly does something like:

do {
     csoundWaitThreadLockNoTimeout(userdata->lock);
} while (csoundPerformKsmps(userdata->csound) == 0);

.. and in the main thread:
while (csoundPerformKsmps(main) == 0) {
     csoundNotifyThreadLock(userdata1->lock);
     csoundNotifyThreadLock(userdata2->lock);
     /*
         calls to csoundGetAudioChannel and csoundSetAudioChannel
         using previously allocated buffers
     */
}


This does basically work, but at sr=48000, on Windows it only runs 
without audio dropouts with kr=12, and on Linux on the same machine, a 
kr about ten times that.
I'm new to multi-threading with audio, but it seems that the threadlock 
might not be able to wake quick enough to keep up with a higher kr 
(which I would like to try and achieve).
The more I've read about audio multithreading, the more I think this may 
be a dead end, so I'd be interested if anyone has any views and 
opinions.

I also considered (but haven't tried) using csoundPerformBuffer, but 
then while the threads may be synchronised, as the get/set audio works 
with ksmps, I couldn't think of any way that would help to exchange 
channels between the threads (perhaps audio could work with spin/spout, 
but k-rate channels would face the same thing).

thanks
RK

Csound mailing list
Csound@listserv.heanet.ie
https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND
Send bugs reports to
        https://github.com/csound/csound/issues
Discussions of bugs and features can be posted here

Date2021-04-03 03:28
FromMichael Gogins
SubjectRe: [Csnd] Multi-threading multiple Csound instances with the API
Your design is similar to the original design for multi-threading within Csound, which I did some work on.

It was never efficient enough.

The current design is more efficient, but it is still not really efficient enough.

You could modify your design by using lock-free FIFOs. A master thread would receive events, channels, and audio and enqueue them in lock-free FIFOs. A number of worker threads would dequeue events, channels, and audio from these FIFOs, process this data, and enqueue the results in other lock-free FIFOs that the master thread would dequeue for output.

Possibly, the Csound channels could also be implemented as lock-free queues.

The csound_threaded.hpp class in the Csound source code enables a single instance of Csound to run in a separate thread. It uses a FIFO to receive events from the host. It might be a starting point for such a design.

I am dubious about the ultimate efficiency of this approach. Csound is a very challenging case for multi-threading. There are all kinds of overhead that end up causing swapping in and out of cache. The FIFOs themselves would almost certainly be efficient enough, the problem would be swapping Csound code and data in and out of cache. But in my experience, you never know until you actually code and test it.

I would advise consulting with John ffitch and working with the existing design before trying to come up with a new one.

Regards,
Mike
-----------------------------------------------------
Michael Gogins
Irreducible Productions
http://michaelgogins.tumblr.com
Michael dot Gogins at gmail dot com


On Fri, Apr 2, 2021 at 9:51 PM Richard Knight <richard@1bpm.net> wrote:
Hi

As far as I understand and have tested/heard, using globals and channels
can undermine any multi-threading when using -j (as opposed to having
self-isolated instruments).

Ideally I'd like to use multi-threading, but also globals/channels, so I
thought an interesting way to get around this could be to use the API to
run multiple instances of Csound and try to handle threading a bit more
manually. The more I looked into this, the more it seemed like it might
create other problems though.

The general idea to test is to have one Csound instance running on the
main thread which uses -odac and then two other threads using -n which
would just send to/receive from the instance on the main thread, and the
API brokering audio between them with channels.
In each thread it roughly does something like:

do {
     csoundWaitThreadLockNoTimeout(userdata->lock);
} while (csoundPerformKsmps(userdata->csound) == 0);

.. and in the main thread:
while (csoundPerformKsmps(main) == 0) {
     csoundNotifyThreadLock(userdata1->lock);
     csoundNotifyThreadLock(userdata2->lock);
     /*
         calls to csoundGetAudioChannel and csoundSetAudioChannel
         using previously allocated buffers
     */
}


This does basically work, but at sr=48000, on Windows it only runs
without audio dropouts with kr=12, and on Linux on the same machine, a
kr about ten times that.
I'm new to multi-threading with audio, but it seems that the threadlock
might not be able to wake quick enough to keep up with a higher kr
(which I would like to try and achieve).
The more I've read about audio multithreading, the more I think this may
be a dead end, so I'd be interested if anyone has any views and
opinions.

I also considered (but haven't tried) using csoundPerformBuffer, but
then while the threads may be synchronised, as the get/set audio works
with ksmps, I couldn't think of any way that would help to exchange
channels between the threads (perhaps audio could work with spin/spout,
but k-rate channels would face the same thing).

thanks
RK

Csound mailing list
Csound@listserv.heanet.ie
https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND
Send bugs reports to
        https://github.com/csound/csound/issues
Discussions of bugs and features can be posted here
Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here

Date2021-04-03 18:26
FromRichard Knight
SubjectRe: [Csnd] Multi-threading multiple Csound instances with the API

Thank you, some really useful insights there.

Threaded audio is certainly more challenging than I anticipated, I thought there might a 'quick win' for some simple specific cases with the design I was trying, maybe not. However it is worth saying the basic test I've done with three threads exchanging channels with the API does seem to perform better across cores than the equivalent in Csound with -j3 (albeit, also using channels) , at the same (low) kr, but it is an extremely specific case.

I'll have a look at the lock-free options, csound_threaded.hpp and revisit John ffitch's papers on multicore/parallel processing, which will likely make more sense to me now having dipped my toe in a bit.

 

On 2021-04-03 03:28, Michael Gogins wrote:

Your design is similar to the original design for multi-threading within Csound, which I did some work on.
 
It was never efficient enough.
 
The current design is more efficient, but it is still not really efficient enough.
 
You could modify your design by using lock-free FIFOs. A master thread would receive events, channels, and audio and enqueue them in lock-free FIFOs. A number of worker threads would dequeue events, channels, and audio from these FIFOs, process this data, and enqueue the results in other lock-free FIFOs that the master thread would dequeue for output.
 
Possibly, the Csound channels could also be implemented as lock-free queues.
 
The csound_threaded.hpp class in the Csound source code enables a single instance of Csound to run in a separate thread. It uses a FIFO to receive events from the host. It might be a starting point for such a design.
 
I am dubious about the ultimate efficiency of this approach. Csound is a very challenging case for multi-threading. There are all kinds of overhead that end up causing swapping in and out of cache. The FIFOs themselves would almost certainly be efficient enough, the problem would be swapping Csound code and data in and out of cache. But in my experience, you never know until you actually code and test it.
 
I would advise consulting with John ffitch and working with the existing design before trying to come up with a new one.
 
Regards,
Mike
-----------------------------------------------------
Michael Gogins
Irreducible Productions
http://michaelgogins.tumblr.com
Michael dot Gogins at gmail dot com

On Fri, Apr 2, 2021 at 9:51 PM Richard Knight <richard@1bpm.net> wrote:
Hi

As far as I understand and have tested/heard, using globals and channels
can undermine any multi-threading when using -j (as opposed to having
self-isolated instruments).

Ideally I'd like to use multi-threading, but also globals/channels, so I
thought an interesting way to get around this could be to use the API to
run multiple instances of Csound and try to handle threading a bit more
manually. The more I looked into this, the more it seemed like it might
create other problems though.

The general idea to test is to have one Csound instance running on the
main thread which uses -odac and then two other threads using -n which
would just send to/receive from the instance on the main thread, and the
API brokering audio between them with channels.
In each thread it roughly does something like:

do {
     csoundWaitThreadLockNoTimeout(userdata->lock);
} while (csoundPerformKsmps(userdata->csound) == 0);

.. and in the main thread:
while (csoundPerformKsmps(main) == 0) {
     csoundNotifyThreadLock(userdata1->lock);
     csoundNotifyThreadLock(userdata2->lock);
     /*
         calls to csoundGetAudioChannel and csoundSetAudioChannel
         using previously allocated buffers
     */
}


This does basically work, but at sr=48000, on Windows it only runs
without audio dropouts with kr=12, and on Linux on the same machine, a
kr about ten times that.
I'm new to multi-threading with audio, but it seems that the threadlock
might not be able to wake quick enough to keep up with a higher kr
(which I would like to try and achieve).
The more I've read about audio multithreading, the more I think this may
be a dead end, so I'd be interested if anyone has any views and
opinions.

I also considered (but haven't tried) using csoundPerformBuffer, but
then while the threads may be synchronised, as the get/set audio works
with ksmps, I couldn't think of any way that would help to exchange
channels between the threads (perhaps audio could work with spin/spout,
but k-rate channels would face the same thing).

thanks
RK

Csound mailing list
Csound@listserv.heanet.ie
https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND
Send bugs reports to
        https://github.com/csound/csound/issues
Discussions of bugs and features can be posted here
Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here