| I take it all back! I was comparing kperf in Xanadu (ksmps = 1) with kperf
in Trapped (ksmps = 10), by mistake.
The change I discuss below actually made only a 1%-2% improvement in
efficiency! That would be down to removing unnecessary assignments; the
compiler may have optimized some out already anyway.
All the more reason to focus on parallelizing...
Embarrasedly,
Mike
----- Original Message -----
From: "Michael Gogins"
To: "Developer discussions"
Sent: Thursday, April 17, 2008 9:17 PM
Subject: Re: [Cs-dev] Vectorization
> Silly me... xanadu.csd runs at ksmps = 1. So, for ksmps greater than one,
> the below does not quite apply.
>
> Still, I was able to reduce the % of time in kperf code in this case (i.e.
> xanadu.csd with ksmps = 1) from 23.37% to 12.37%, in short I virtually
> doubled the efficiency of Csound's inner loop. Don't get all excited, this
> doesn't count the functions called by kperf, i.e., the opcodes themselves,
> which consume the vast majority of time in Csound.
>
> I did this by removing the threading code, removing what appeared to be
> unncessary temporary variables and assignments, and recasting the while
> loops as for loops. The innermost loop changed from
>
> csound->spoutactive = 0; /* make spout inactive */
> barrier1 = csound->multiThreadedBarrier1;
> barrier2 = csound->multiThreadedBarrier2;
> ip = csound->actanchor.nxtact;
> if (ip != NULL) {
> csound->multiThreadedStart = ip;
> if (csound->multiThreadedThreadInfo != NULL) {
> while (csound->multiThreadedStart != NULL) {
> INSDS *current = csound->multiThreadedStart;
> while(current != NULL &&
> (current->insno == csound->multiThreadedStart->insno)) {
> current = current->nxtact;
> }
> csound->multiThreadedEnd = current;
> csound->WaitBarrier(barrier1);
> csound->WaitBarrier(barrier2);
> csound->multiThreadedStart = current;
> }
> }
> else {
> while (ip != NULL) { /* for each instr active: */
> INSDS *nxt = ip->nxtact;
> csound->pds = (OPDS*) ip;
> while ((csound->pds = csound->pds->nxtp) != NULL) {
> (*csound->pds->opadr)(csound, csound->pds); /* run each opcode
> */
> }
> ip = nxt; /* but this does not allow for all deletions */
> }
> }
> }
>
> to
>
> csound->spoutactive = 0;
> for (activeInstrument = csound->actanchor.nxtact;
> activeInstrument;
> activeInstrument = activeInstrument->nxtact) {
> for (csound->pds = activeInstrument->nxtp;
> csound->pds;
> csound->pds = csound->pds->nxtp) {
> (*csound->pds->opadr)(csound, csound->pds);
> }
> }
>
> I haven't booked this in, mainly because I don't understand the comment
> "but
> this does not allow for all deletions." Are some of these two-step
> assignments necessary in some cases that I didn't test, or is it just
> unexamined code?
>
> I don't have the stomach to go through the opcodes as a whole, but I am
> going to take a look at the oscillator indexing and interpolation, and at
> the krate/arate arithmetic opcodes. Although of course these are very
> straightforward macros and should be extremely efficient, I am interested
> in
> trying ATLAS, which is a version of BLAS that compiles to take every
> possible advantage of the processor. These arithmetic opcodes are used all
> over the place and even a small speedup would be helpful.
>
> However, even if ATLAS produces some sort of speedup, it doesn't look like
> Csound is going to get 2 x faster this way. It's already pretty well
> optimized.
>
> And that, in turn, reinforces the conclusion that parallizing synthesis is
> the only way to get real speedups.
>
> Regards,
> Mike
>
>
>
> ----- Original Message -----
> From: "Michael Gogins"
> To: "Developer discussions"
> Sent: Wednesday, April 16, 2008 11:22 PM
> Subject: Re: [Cs-dev] Vectorization
>
>
>> I've done a bit of profiling; gprof is not completely satisfactory, since
>> I
>> am used to sunstudio on Solaris, so far superior it makes one shake one's
>> head. On Solaris, the Sun compiler automatically builds code for
>> profiling,
>> and samples code by line from the kernel to give complete timing for each
>> line of source code, as well as the call graphs supported by gprof. This
>> is
>> very useful...
>>
>> At any rate, it's clear enough that almost all time in csound
>> examples/xanadu.csd is going into kperf and opcodes called by kperf, and
>> quite negligible amounts into reading orc and sco, printing messages, or
>> writing the output soundfile.
>>
>> kperf in itself takes about 23% of the time not counting its callees, and
>> in
>> turn fairly evenly divides its callees' time between the opcodes, which
>> include not only pluck, klinseg, oscillators, and delay lines, but also
>> krate-arate, arate-krate, and arate-arate arithmetic.
>>
>> Since the kperf loop looks pretty minimal, I have no idea why its own
>> code
>> should be eating 23% of the performance time. That is my major question
>> at
>> this point. I may need to rewrite kperf as a set of functions to obtain a
>> more detailed breakdown of this time. Or I can develop some insight in
>> the
>> debugger.
>>
>> On Linux, you can rebuild the kernel and use oprofile, which can do what
>> sunstudio can do, and give line by line timings inside functions.
>>
>> I will also profile Trapped in Convert since it performs many more short
>> events, in comparison with Xanadu which performs a few long guitar-like
>> chords.
>>
>> Regards,
>> Mike
>>
>> ----- Original Message -----
>> From: "Steven Yi"
>> To: "Developer discussions"
>> Sent: Wednesday, April 16, 2008 8:21 PM
>> Subject: Re: [Cs-dev] Vectorization
>>
>>
>>> Hi Victor,
>>>
>>> I haven't done it in a long time, but I found an email I posted to the
>>> dev list a year and a half ago that has some instructions:
>>>
>>> http://www.nabble.com/Call-Graph-%28Postscript-file%29-td5730328.html#a5730328
>>>
>>> I haven't run it in a long while but those commands that were run
>>> there should do the trick.
>>>
>>> steven
>>>
>>> On Wed, Apr 16, 2008 at 12:33 PM, victor
>>> wrote:
>>>> I built csound with useGprof=1, run a simple one-oscillator test, then
>>>> tried gprof and got an empty profile. Why is that?
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: "Steven Yi"
>>>>
>>>> To: "Michael Gogins" ; "Developer discussions"
>>>>
>>>>
>>>>
>>>> Sent: Wednesday, April 16, 2008 7:42 PM
>>>> Subject: Re: [Cs-dev] Vectorization
>>>>
>>>>
>>>> > Hi Michael,
>>>> >
>>>> > I'm interested in the profiling results; what profiler are you using
>>>> > by the way? I remember doing profiling a while back using debug
>>>> > builds and gprof (the options for that are still in SConstruct), but
>>>> > when I did profiling it was more for to try to figure out why the
>>>> > Pinkston FM model was really slow at the time, not so much for
>>>> general
>>>> > Csound performance.
>>>> >
>>>> > I've poked around most of Csound's engine and feel like I know it
>>>> > pretty well, especially when I was more actively working on the new
>>>> > parser. If there's any questions that arise about Csound internals
>>>> > and how things are allocated, I'd suggest looking at the new parser
>>>> > (well, the compile part that translates the AST to the INSTRTXT data
>>>> > structs that Csound uses at performance time) to see how things are
>>>> > built, as to me it's a bit clearer than reading the old parser code.
>>>> >
>>>> > steven
>>>> >
>>>> >
>>>> > On Wed, Apr 16, 2008 at 11:27 AM, Michael Gogins
>>>>
>>>> > wrote:
>>>> >> Yes, this is interesting. I have not yet tried unsafe math
>>>> optimizations,
>>>> >> but will.
>>>> >>
>>>> >> I have tried inlining as much code as possible, which in practice
>>>> means
>>>> >> defining all C++ member functions in the header file. That
>>>> consistently
>>>> >> produces somewhere between 5% and 15% speedups, right there.
>>>> >>
>>>> >> This in is the context of intermittently continuing development of
>>>> >> Silence, an algorithmic composition/software synthesis library of
>>>> my
>>>> own
>>>> >> design. Currently, Silence renders audio just slightly faster than
>>>> >> Csound, but this is with hard-coded STK Rhodey C++ instruments.
>>>> (But
>>>> >> then, my comparison Csound instrument uses the STK Rhodey opcode
>>>> also, so
>>>> >> the comparison is more fair than it might seem: it mostly compares
>>>> >> instrument allocation and event dispatching). Dynamically defined
>>>> >> instruments will be slower in Silence, perhaps also in Csound
>>>> (i.e.,
>>>> >> using more opcodes in the instr block instead of just calling one
>>>> opcode
>>>> >> that does all the work). I shall soon know more, as I have
>>>> finalized
>>>> my
>>>> >> design of dynamically defined instruments and unit generators in
>>>> Silence.
>>>> >>
>>>> >> It is the usefulness of the profiler in getting this performance
>>>> that
>>>> >> has decided me to profile Csound. I am very curious to see how much
>>>> >> "slack" there is, how much scope for performance improvements. Most
>>>> of
>>>> >> Csound's opcodes code looks to be quite efficient, and I am
>>>> certainly
>>>> not
>>>> >> going to monkey with the fundamental design of the engine, so I am
>>>> most
>>>> >> curious about the efficiency of the engine implementation,
>>>> especially
>>>> the
>>>> >> kperf loop, event initializers, output drivers, and so on. I don't,
>>>> >> actually, expect much slack but the profiler will, perhaps, point
>>>> out
>>>> a
>>>> >> few areas that have not been completely thought through.
>>>> >>
>>>> >> Regards,
>>>> >> Mike
>>>> >>
>>>> >>
>>>> >>
>>>> >> -----Original Message-----
>>>> >> >From: Steven Yi
>>>> >> >Sent: Apr 16, 2008 12:00 PM
>>>> >> >To: Developer discussions
>>>> >> >Subject: [Cs-dev] Vectorization
>>>> >> >
>>>> >> >Hi All,
>>>> >> >
>>>> >> >There's an interesting thread going on on linux-audio-dev about
>>>> the
>>>> >> >performance of gcc vectorization code:
>>>> >> >
>>>> >> >http://www.nabble.com/vectorization-td15339532.html#a16720581
>>>> >> >
>>>> >> >The thread started in February but resumed a day or two ago, with
>>>> >> >someone reporting better results with gcc than using assembly.
>>>> >> >
>>>> >> >steven
>>>> >> >
>>>> >>
>>>> >>
>>>>
>>>>
>>>> >-------------------------------------------------------------------------
>>>> >> >This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>> >> >Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>> >> >Use priority code J8TL2D2.
>>>> >>
>>>> >>
>>>>
>>>>
>>>> >http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>> >> >_______________________________________________
>>>> >> >Csound-devel mailing list
>>>> >> >Csound-devel@lists.sourceforge.net
>>>> >> >https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> >> -------------------------------------------------------------------------
>>>> >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>> >> Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>> >> Use priority code J8TL2D2.
>>>> >>
>>>> >>
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>> >> _______________________________________________
>>>> >> Csound-devel mailing list
>>>> >> Csound-devel@lists.sourceforge.net
>>>> >> https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>> >>
>>>> >
>>>> > -------------------------------------------------------------------------
>>>> > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>> > Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>> > Use priority code J8TL2D2.
>>>> >
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>> > _______________________________________________
>>>> > Csound-devel mailing list
>>>> > Csound-devel@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>
>>>>
>>>> -------------------------------------------------------------------------
>>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>> Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>> Use priority code J8TL2D2.
>>>>
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>> _______________________________________________
>>>> Csound-devel mailing list
>>>> Csound-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>> Don't miss this year's exciting event. There's still time to save $100.
>>> Use priority code J8TL2D2.
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>> _______________________________________________
>>> Csound-devel mailing list
>>> Csound-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>> Don't miss this year's exciting event. There's still time to save $100.
>> Use priority code J8TL2D2.
>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>> _______________________________________________
>> Csound-devel mailing list
>> Csound-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Csound-devel mailing list
> Csound-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/csound-devel
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net |