Csound Csound-dev Csound-tekno Search About

[Csnd-dev] Parallel Csound performance

Date2025-10-29 22:08
FromVictor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE>
Subject[Csnd-dev] Parallel Csound performance
I’m back investigating parallel Csound performance and it was interesting to see that
Csound 7 is working well with it, in fact far better than 6.18. 

Running 1000 oscillator instrs for 60 secs I got these stats

ksmps = 100
-----------
single thread: 3.9s
8 threads: 3.9s

ksmps = 150
----------
single thread: 3.5s
8 threads: 2.7s

ksmps = 300
----------
single thread: 3.2s
8 threads: 1.6s

ksmps = 500
----------
single thread: 3.2s
8 threads: 1.2s

ksmps = 1000
----------
single thread: 3.3s
8 threads: 0.8

In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.

I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
with diminishing returns after that. In my computer, 8 threads seem to be the optimal.

best
========================
Prof. Victor Lazzarini
Maynooth University
Ireland







Date2025-10-29 22:13
FromSteven Yi
SubjectRe: [Csnd-dev] Parallel Csound performance
This is odd, as I thought cspar was disabled in Csound?

On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
<000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>
> I’m back investigating parallel Csound performance and it was interesting to see that
> Csound 7 is working well with it, in fact far better than 6.18.
>
> Running 1000 oscillator instrs for 60 secs I got these stats
>
> ksmps = 100
> -----------
> single thread: 3.9s
> 8 threads: 3.9s
>
> ksmps = 150
> ----------
> single thread: 3.5s
> 8 threads: 2.7s
>
> ksmps = 300
> ----------
> single thread: 3.2s
> 8 threads: 1.6s
>
> ksmps = 500
> ----------
> single thread: 3.2s
> 8 threads: 1.2s
>
> ksmps = 1000
> ----------
> single thread: 3.3s
> 8 threads: 0.8
>
> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>
> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>
> best
> ========================
> Prof. Victor Lazzarini
> Maynooth University
> Ireland
>
>
>
>
>
>

Date2025-10-29 22:22
FromVictor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE>
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.

I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.

best
Prof. Victor Lazzarini
Maynooth University
Ireland

> On 29 Oct 2025, at 22:14, Steven Yi  wrote:
> 
> *Warning*
> 
> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
> 
> This is odd, as I thought cspar was disabled in Csound?
> 
>> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>> 
>> I’m back investigating parallel Csound performance and it was interesting to see that
>> Csound 7 is working well with it, in fact far better than 6.18.
>> 
>> Running 1000 oscillator instrs for 60 secs I got these stats
>> 
>> ksmps = 100
>> -----------
>> single thread: 3.9s
>> 8 threads: 3.9s
>> 
>> ksmps = 150
>> ----------
>> single thread: 3.5s
>> 8 threads: 2.7s
>> 
>> ksmps = 300
>> ----------
>> single thread: 3.2s
>> 8 threads: 1.6s
>> 
>> ksmps = 500
>> ----------
>> single thread: 3.2s
>> 8 threads: 1.2s
>> 
>> ksmps = 1000
>> ----------
>> single thread: 3.3s
>> 8 threads: 0.8
>> 
>> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
>> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>> 
>> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
>> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
>> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
>> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>> 
>> best
>> ========================
>> Prof. Victor Lazzarini
>> Maynooth University
>> Ireland
>> 
>> 
>> 
>> 
>> 
>> 

Date2025-10-29 22:48
FromMichael Gogins
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
Thanks for tracking this. 

Regards, 
Mike


-----------------------------------------------------
Michael Gogins
Irreducible Productions
http://michaelgogins.tumblr.com
Michael dot Gogins at gmail dot com

On Wed, Oct 29, 2025, 23:22 Victor Lazzarini <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.

I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.

best
Prof. Victor Lazzarini
Maynooth University
Ireland

> On 29 Oct 2025, at 22:14, Steven Yi <stevenyi@gmail.com> wrote:
>
> *Warning*
>
> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
>
> This is odd, as I thought cspar was disabled in Csound?
>
>> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>
>> I’m back investigating parallel Csound performance and it was interesting to see that
>> Csound 7 is working well with it, in fact far better than 6.18.
>>
>> Running 1000 oscillator instrs for 60 secs I got these stats
>>
>> ksmps = 100
>> -----------
>> single thread: 3.9s
>> 8 threads: 3.9s
>>
>> ksmps = 150
>> ----------
>> single thread: 3.5s
>> 8 threads: 2.7s
>>
>> ksmps = 300
>> ----------
>> single thread: 3.2s
>> 8 threads: 1.6s
>>
>> ksmps = 500
>> ----------
>> single thread: 3.2s
>> 8 threads: 1.2s
>>
>> ksmps = 1000
>> ----------
>> single thread: 3.3s
>> 8 threads: 0.8
>>
>> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
>> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>>
>> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
>> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
>> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
>> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>>
>> best
>> ========================
>> Prof. Victor Lazzarini
>> Maynooth University
>> Ireland
>>
>>
>>
>>
>>
>>

Date2025-10-29 23:12
FromSteven Yi
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
I'd be careful about it running fast but incorrectly. That's always
been my concern with ParCS is correctness.

On Wed, Oct 29, 2025 at 6:22 PM Victor Lazzarini
<000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>
> No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.
>
> I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.
>
> best
> Prof. Victor Lazzarini
> Maynooth University
> Ireland
>
> > On 29 Oct 2025, at 22:14, Steven Yi  wrote:
> >
> > *Warning*
> >
> > This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
> >
> > This is odd, as I thought cspar was disabled in Csound?
> >
> >> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
> >> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
> >>
> >> I’m back investigating parallel Csound performance and it was interesting to see that
> >> Csound 7 is working well with it, in fact far better than 6.18.
> >>
> >> Running 1000 oscillator instrs for 60 secs I got these stats
> >>
> >> ksmps = 100
> >> -----------
> >> single thread: 3.9s
> >> 8 threads: 3.9s
> >>
> >> ksmps = 150
> >> ----------
> >> single thread: 3.5s
> >> 8 threads: 2.7s
> >>
> >> ksmps = 300
> >> ----------
> >> single thread: 3.2s
> >> 8 threads: 1.6s
> >>
> >> ksmps = 500
> >> ----------
> >> single thread: 3.2s
> >> 8 threads: 1.2s
> >>
> >> ksmps = 1000
> >> ----------
> >> single thread: 3.3s
> >> 8 threads: 0.8
> >>
> >> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
> >> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
> >>
> >> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
> >> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
> >> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
> >> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
> >>
> >> best
> >> ========================
> >> Prof. Victor Lazzarini
> >> Maynooth University
> >> Ireland
> >>
> >>
> >>
> >>
> >>
> >>

Date2025-10-29 23:36
FromVictor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE>
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
yes, it is running correctly. Output waveforms are the same.
========================
Prof. Victor Lazzarini
Maynooth University
Ireland

> On 29 Oct 2025, at 23:12, Steven Yi  wrote:
> 
> I'd be careful about it running fast but incorrectly. That's always
> been my concern with ParCS is correctness.
> 
> On Wed, Oct 29, 2025 at 6:22 PM Victor Lazzarini
> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>> 
>> No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.
>> 
>> I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.
>> 
>> best
>> Prof. Victor Lazzarini
>> Maynooth University
>> Ireland
>> 
>>> On 29 Oct 2025, at 22:14, Steven Yi  wrote:
>>> 
>>> *Warning*
>>> 
>>> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
>>> 
>>> This is odd, as I thought cspar was disabled in Csound?
>>> 
>>>> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
>>>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>>> 
>>>> I’m back investigating parallel Csound performance and it was interesting to see that
>>>> Csound 7 is working well with it, in fact far better than 6.18.
>>>> 
>>>> Running 1000 oscillator instrs for 60 secs I got these stats
>>>> 
>>>> ksmps = 100
>>>> -----------
>>>> single thread: 3.9s
>>>> 8 threads: 3.9s
>>>> 
>>>> ksmps = 150
>>>> ----------
>>>> single thread: 3.5s
>>>> 8 threads: 2.7s
>>>> 
>>>> ksmps = 300
>>>> ----------
>>>> single thread: 3.2s
>>>> 8 threads: 1.6s
>>>> 
>>>> ksmps = 500
>>>> ----------
>>>> single thread: 3.2s
>>>> 8 threads: 1.2s
>>>> 
>>>> ksmps = 1000
>>>> ----------
>>>> single thread: 3.3s
>>>> 8 threads: 0.8
>>>> 
>>>> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
>>>> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>>>> 
>>>> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
>>>> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
>>>> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
>>>> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>>>> 
>>>> best
>>>> ========================
>>>> Prof. Victor Lazzarini
>>>> Maynooth University
>>>> Ireland
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 


Date2025-10-30 16:44
FromVictor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE>
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
I did a little work on this and replaced the lock-based barrier (which seemed heavy to me) by a lightweight lock-free one.
Now I am getting some interesting results. Xanadu, which seems to be a good test case, runs like this

single-threaded: 0.78s
8 threads: 0.37s

out of the box. I checked the outputs (as usual) and the waveforms are identical.
========================
Prof. Victor Lazzarini
Maynooth University
Ireland

> On 29 Oct 2025, at 23:36, Victor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE> wrote:
> 
> yes, it is running correctly. Output waveforms are the same.
> ========================
> Prof. Victor Lazzarini
> Maynooth University
> Ireland
> 
>> On 29 Oct 2025, at 23:12, Steven Yi  wrote:
>> 
>> I'd be careful about it running fast but incorrectly. That's always
>> been my concern with ParCS is correctness.
>> 
>> On Wed, Oct 29, 2025 at 6:22 PM Victor Lazzarini
>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>> 
>>> No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.
>>> 
>>> I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.
>>> 
>>> best
>>> Prof. Victor Lazzarini
>>> Maynooth University
>>> Ireland
>>> 
>>>> On 29 Oct 2025, at 22:14, Steven Yi  wrote:
>>>> 
>>>> *Warning*
>>>> 
>>>> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
>>>> 
>>>> This is odd, as I thought cspar was disabled in Csound?
>>>> 
>>>>> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
>>>>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>>>> 
>>>>> I’m back investigating parallel Csound performance and it was interesting to see that
>>>>> Csound 7 is working well with it, in fact far better than 6.18.
>>>>> 
>>>>> Running 1000 oscillator instrs for 60 secs I got these stats
>>>>> 
>>>>> ksmps = 100
>>>>> -----------
>>>>> single thread: 3.9s
>>>>> 8 threads: 3.9s
>>>>> 
>>>>> ksmps = 150
>>>>> ----------
>>>>> single thread: 3.5s
>>>>> 8 threads: 2.7s
>>>>> 
>>>>> ksmps = 300
>>>>> ----------
>>>>> single thread: 3.2s
>>>>> 8 threads: 1.6s
>>>>> 
>>>>> ksmps = 500
>>>>> ----------
>>>>> single thread: 3.2s
>>>>> 8 threads: 1.2s
>>>>> 
>>>>> ksmps = 1000
>>>>> ----------
>>>>> single thread: 3.3s
>>>>> 8 threads: 0.8
>>>>> 
>>>>> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
>>>>> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>>>>> 
>>>>> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
>>>>> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
>>>>> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
>>>>> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>>>>> 
>>>>> best
>>>>> ========================
>>>>> Prof. Victor Lazzarini
>>>>> Maynooth University
>>>>> Ireland
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
> 


Date2025-10-30 19:57
FromMichael Gogins
SubjectRe: [Csnd-dev] [EXTERNAL] Re: [Csnd-dev] Parallel Csound performance
Very nice!

-----------------------------------------------------
Michael Gogins
Irreducible Productions
http://michaelgogins.tumblr.com
Michael dot Gogins at gmail dot com


On Thu, Oct 30, 2025 at 5:44 PM Victor Lazzarini <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
I did a little work on this and replaced the lock-based barrier (which seemed heavy to me) by a lightweight lock-free one.
Now I am getting some interesting results. Xanadu, which seems to be a good test case, runs like this

single-threaded: 0.78s
8 threads: 0.37s

out of the box. I checked the outputs (as usual) and the waveforms are identical.
========================
Prof. Victor Lazzarini
Maynooth University
Ireland

> On 29 Oct 2025, at 23:36, Victor Lazzarini <000010b17ddd988e-dmarc-request@LISTSERV.HEANET.IE> wrote:
>
> yes, it is running correctly. Output waveforms are the same.
> ========================
> Prof. Victor Lazzarini
> Maynooth University
> Ireland
>
>> On 29 Oct 2025, at 23:12, Steven Yi <stevenyi@gmail.com> wrote:
>>
>> I'd be careful about it running fast but incorrectly. That's always
>> been my concern with ParCS is correctness.
>>
>> On Wed, Oct 29, 2025 at 6:22 PM Victor Lazzarini
>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>>
>>> No, I don't think we ever disabled it. There were some questions about re-enabling some analysis to the parser (which existed in parser 2) but beyond that we never touched it.
>>>
>>> I continued to build with it but had not been actively testing the code. I am surprised at how well it run without any changes.
>>>
>>> best
>>> Prof. Victor Lazzarini
>>> Maynooth University
>>> Ireland
>>>
>>>> On 29 Oct 2025, at 22:14, Steven Yi <stevenyi@gmail.com> wrote:
>>>>
>>>> *Warning*
>>>>
>>>> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
>>>>
>>>> This is odd, as I thought cspar was disabled in Csound?
>>>>
>>>>> On Wed, Oct 29, 2025 at 6:08 PM Victor Lazzarini
>>>>> <000010b17ddd988e-dmarc-request@listserv.heanet.ie> wrote:
>>>>>
>>>>> I’m back investigating parallel Csound performance and it was interesting to see that
>>>>> Csound 7 is working well with it, in fact far better than 6.18.
>>>>>
>>>>> Running 1000 oscillator instrs for 60 secs I got these stats
>>>>>
>>>>> ksmps = 100
>>>>> -----------
>>>>> single thread: 3.9s
>>>>> 8 threads: 3.9s
>>>>>
>>>>> ksmps = 150
>>>>> ----------
>>>>> single thread: 3.5s
>>>>> 8 threads: 2.7s
>>>>>
>>>>> ksmps = 300
>>>>> ----------
>>>>> single thread: 3.2s
>>>>> 8 threads: 1.6s
>>>>>
>>>>> ksmps = 500
>>>>> ----------
>>>>> single thread: 3.2s
>>>>> 8 threads: 1.2s
>>>>>
>>>>> ksmps = 1000
>>>>> ----------
>>>>> single thread: 3.3s
>>>>> 8 threads: 0.8
>>>>>
>>>>> In comparison with 6.18 the last test ran at 11secs for 8 threads. In fact, 6.18 was always slower in parallel
>>>>> performance. Something was very broken there. Single-threaded performance is on a par with Csound 7.
>>>>>
>>>>> I can only think the new parser is working better with the parallel dispatch, or that something else was wrong
>>>>> in the 6.x and got fixed in 7. Anyway, I am pleased with these results. In general, ksmps=100 seems to be the break even
>>>>> point, below that multi-threaded performance is always losing. Also ksmps=1000 is more or less the highest gain,
>>>>> with diminishing returns after that. In my computer, 8 threads seem to be the optimal.
>>>>>
>>>>> best
>>>>> ========================
>>>>> Prof. Victor Lazzarini
>>>>> Maynooth University
>>>>> Ireland
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>