The cost of out
Date | 2017-02-03 10:02 |
From | Eduardo Moguillansky |
Subject | The cost of out |
Hi While porting a project
from supercollider to csound I noticed that csound uses a
considerable amount of cpu just for signal output. When run in
real time, the csd below uses ~45% cpu for just copying to
spout. If instead of using "out" in instr 1 the signals are
accumulated in a global variable and output once in another
instrument, cpu drops down to ~30%. Is this something known
which should be avoided? The same processing in supercollider
(outputting 1000 silent signals), results in ~15% CPU
|
Date | 2017-02-03 11:38 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
Is your build optimised? (-O3) ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 10:02, Eduardo Moguillansky |
Date | 2017-02-03 12:01 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
But I also noted this here. Your code runs for about 9s for 100s of output; with a single global out in another instrument, it runs for 5s. But if I replace out for outch 1, a1, it runs for 4s. So it’s out that seems to be slow. Needs a check. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 11:38, Victor Lazzarini |
Date | 2017-02-03 12:02 |
From | Eduardo Moguillansky |
Subject | Re: The cost of out |
Yes, I think so. I do
mkdir build cd build cmake-gui .. make sudo make install
After having done that
I this in build/CMakeCache.txt CMakeCache.txt:CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG CMakeCache.txt:CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG On 03.02.2017 12:38, Victor Lazzarini
wrote:
Is your build optimised? (-O3) ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952On 3 Feb 2017, at 10:02, Eduardo Moguillansky <eduardo.moguillansky@gmail.com> wrote: Hi While porting a project from supercollider to csound I noticed that csound uses a considerable amount of cpu just for signal output. When run in real time, the csd below uses ~45% cpu for just copying to spout. If instead of using "out" in instr 1 the signals are accumulated in a global variable and output once in another instrument, cpu drops down to ~30%. Is this something known which should be avoided? The same processing in supercollider (outputting 1000 silent signals), results in ~15% CPU <CsInstruments> sr = 48000 nchnls = 2 ksmps = 128 0dbfs = 1 instr 1 a1 init 0 out a1 endin instr 2 iN = 1000 idx = 0 while idx < iN do event_i "i", 1, 0, 100, 1/iN idx += 1 od turnoff endin </CsInstruments> <CsScore> i2 0 100 </CsScore> ( 1000.do { { Out.ar(0, Silent.ar(1)) }.play } ) Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here |
Date | 2017-02-03 12:05 |
From | Eduardo Moguillansky |
Subject | Re: The cost of out |
I tried the different
out opcodes and all have the same impact. Can it be the spinlocks?
On 03.02.2017 13:01, Victor Lazzarini
wrote:
But I also noted this here. Your code runs for about 9s for 100s of output; with a single global out in another instrument, it runs for 5s. But if I replace out for outch 1, a1, it runs for 4s. So it’s out that seems to be slow. Needs a check. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952On 3 Feb 2017, at 11:38, Victor Lazzarini <Victor.Lazzarini@nuim.ie> wrote: Is your build optimised? (-O3) ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952On 3 Feb 2017, at 10:02, Eduardo Moguillansky <eduardo.moguillansky@gmail.com> wrote: Hi While porting a project from supercollider to csound I noticed that csound uses a considerable amount of cpu just for signal output. When run in real time, the csd below uses ~45% cpu for just copying to spout. If instead of using "out" in instr 1 the signals are accumulated in a global variable and output once in another instrument, cpu drops down to ~30%. Is this something known which should be avoided? The same processing in supercollider (outputting 1000 silent signals), results in ~15% CPU <CsInstruments> sr = 48000 nchnls = 2 ksmps = 128 0dbfs = 1 instr 1 a1 init 0 out a1 endin instr 2 iN = 1000 idx = 0 while idx < iN do event_i "i", 1, 0, 100, 1/iN idx += 1 od turnoff endin </CsInstruments> <CsScore> i2 0 100 </CsScore> ( 1000.do { { Out.ar(0, Silent.ar(1)) }.play } ) Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here |
Date | 2017-02-03 12:35 |
From | John ff |
Subject | Re: The cost of out |
Could be. All the code does is protect the adding in of the signal. On a uniprocessor the protection should cost little.
Sent from TypeApp
On 3 Feb 2017, at 12:06, Eduardo Moguillansky <eduardo.moguillansky@GMAIL.COM> wrote:
|
Date | 2017-02-03 13:14 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
Here on OSX, cost of outch is 1/2 that of out.
Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy
Maynooth University
Ireland
|
Date | 2017-02-03 13:27 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
but the case here pushes it. 1000 lock unlocks; I wonder if we should go back
to the idea of each instrument owning its spin/spout, which then gets reduced at the end of a kcycle. It should require 1 lock unlock regardless of the number of instances.
Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy
Maynooth University
Ireland
|
Date | 2017-02-03 13:57 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
I have checked this and you are right, spinlocks costs nothing. The cost is solely in the copying of data to an interleaved format using a loop. Replacing the loop by a memcpy brings the cost down to < 1/3 of the original. Of course that only works for mono. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 13:14, Victor Lazzarini |
Date | 2017-02-03 14:39 |
From | jpff |
Subject | Re: The cost of out |
sorry do t undestand. i would epet about 100% cpu as it is doing nothing else. What are you measurg? I am seeing it un in 9s at 99.% On Fri, 3 Feb 2017, Eduardo Moguillansky wrote: > > Hi > > While porting a project from supercollider to csound I noticed that csound > uses a considerable amount of cpu just for signal output. When run in real > time, the csd below uses ~45% cpu for just copying to spout. If instead of > using "out" in instr 1 the signals are accumulated in a global variable and > output once in another instrument, cpu drops down to ~30%. Is this something > known which should be avoided? The same processing in supercollider > (outputting 1000 silent signals), results in ~15% CPU > > |
Date | 2017-02-03 14:52 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
I suspect he is running in realtime. My measurement is running it with -n and checking the processing time. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 14:39, jpff |
Date | 2017-02-03 14:59 |
From | Eduardo Moguillansky |
Subject | Re: The cost of out |
I was running in
realtime in order to compare the performance with supercollider.
The percents given are not significant in themselves, but I
would have expected to see a similar performance of "out" vs
accumulating in an a-variable and doing one "out" at the end of
each k-cycle. On 03.02.2017 15:39, jpff wrote:
sorry do t undestand. i would epet about 100% cpu as it is doing nothing else. What are you measurg? I am seeing it un in 9s at 99.% |
Date | 2017-02-03 15:03 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
No, because the difference is that ga1 += a1 is cheap and then we only have to interleave the data to spout once. When the instrument is writing to out, we have 1000 times the loop that interleaves the data. So it’s more expensive. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 14:59, Eduardo Moguillansky |
Date | 2017-02-03 15:12 |
From | Eduardo Moguillansky |
Subject | Re: The cost of out |
And couldn't this be
done, so that the out opcodes accumulate all writes to a
contiguous array and interleave to spout at the end of the
cycle? On 03.02.2017 16:03, Victor Lazzarini
wrote:
No, because the difference is that ga1 += a1 is cheap and then we only have to interleave the data to spout once. When the instrument is writing to out, we have 1000 times the loop that interleaves the data. So it’s more expensive. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952On 3 Feb 2017, at 14:59, Eduardo Moguillansky <eduardo.moguillansky@GMAIL.COM> wrote: I was running in realtime in order to compare the performance with supercollider. The percents given are not significant in themselves, but I would have expected to see a similar performance of "out" vs accumulating in an a-variable and doing one "out" at the end of each k-cycle. On 03.02.2017 15:39, jpff wrote:sorry do t undestand. i would epet about 100% cpu as it is doing nothing else. What are you measurg? I am seeing it un in 9s at 99.% On Fri, 3 Feb 2017, Eduardo Moguillansky wrote:Hi While porting a project from supercollider to csound I noticed that csound uses a considerable amount of cpu just for signal output. When run in real time, the csd below uses ~45% cpu for just copying to spout. If instead of using "out" in instr 1 the signals are accumulated in a global variable and output once in another instrument, cpu drops down to ~30%. Is this something known which should be avoided? The same processing in supercollider (outputting 1000 silent signals), results in ~15% CPU <CsInstruments> sr = 48000 nchnls = 2 ksmps = 128 0dbfs = 1 instr 1 a1 init 0 out a1 endin instr 2 iN = 1000 idx = 0 while idx < iN do event_i "i", 1, 0, 100, 1/iN idx += 1 od turnoff endin </CsInstruments> <CsScore> i2 0 100 </CsScore> ( 1000.do { { Out.ar(0, Silent.ar(1)) }.play } ) Csound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted hereCsound mailing list Csound@listserv.heanet.ie https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND Send bugs reports to https://github.com/csound/csound/issues Discussions of bugs and features can be posted here |
Date | 2017-02-03 15:14 |
From | jpff |
Subject | Re: The cost of out |
The only fix i can think of is te per instance spout whic we have discussed before; ot make spout not intertwinned and fix nly once On Fri, 3 Feb 2017, Victor Lazzarini wrote: > No, because the difference is that ga1 += a1 is cheap and then we only have > to interleave the data to spout once. When the instrument is writing to out, we have > 1000 times the loop that interleaves the data. So it’s more expensive. > > ======================== > Prof. Victor Lazzarini > Dean of Arts, Celtic Studies, and Philosophy, > Maynooth University, > Maynooth, Co Kildare, Ireland > Tel: 00 353 7086936 > Fax: 00 353 1 7086952 > >> On 3 Feb 2017, at 14:59, Eduardo Moguillansky |
Date | 2017-02-03 15:15 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
We could try and see what exactly is the slowdown, but yes, that could be done alright. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 15:12, Eduardo Moguillansky |
Date | 2017-02-03 15:16 |
From | Victor Lazzarini |
Subject | Re: The cost of out |
I think we could try the latter, which would be simpler, then see if that makes a difference. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 3 Feb 2017, at 15:14, jpff |
Date | 2017-02-03 15:38 |
From | Steven Yi |
Subject | Re: The cost of out |
Firstly, thanks Eduardo for identifying this! Secondly, and perhaps obviously, if we modify output to use a non-interleaved layout, we will, at least for Csound 6, require spin/spout to be interleaved so that API users using spin/spout will still function. (We could use an internal spin/spout that get used so that the external spin/spout can remain compatible.) On Fri, Feb 3, 2017 at 10:16 AM, Victor Lazzarini |