Csound Csound-dev Csound-tekno Search About

Re: [Cs-dev] Vectorization

Date2008-04-18 02:17
From"Michael Gogins"
SubjectRe: [Cs-dev] Vectorization
Silly me... xanadu.csd runs at ksmps = 1. So, for ksmps greater than one, 
the below does not quite apply.

Still, I was able to reduce the % of time in kperf code in this case (i.e. 
xanadu.csd with ksmps = 1) from 23.37% to 12.37%, in short I virtually 
doubled the efficiency of Csound's inner loop. Don't get all excited, this 
doesn't count the functions called by kperf, i.e., the opcodes themselves, 
which consume the vast majority of time in Csound.

I did this by removing the threading code, removing what appeared to be 
unncessary temporary variables and assignments, and recasting the while 
loops as for loops. The innermost loop changed from

    csound->spoutactive = 0;            /*   make spout inactive   */
    barrier1 = csound->multiThreadedBarrier1;
    barrier2 = csound->multiThreadedBarrier2;
    ip = csound->actanchor.nxtact;
    if (ip != NULL) {
      csound->multiThreadedStart = ip;
      if (csound->multiThreadedThreadInfo != NULL) {
        while (csound->multiThreadedStart != NULL) {
          INSDS *current = csound->multiThreadedStart;
          while(current != NULL &&
                (current->insno == csound->multiThreadedStart->insno)) {
            current = current->nxtact;
          }
          csound->multiThreadedEnd = current;
                csound->WaitBarrier(barrier1);
                 csound->WaitBarrier(barrier2);
                csound->multiThreadedStart = current;
            }
        }
      else {
        while (ip != NULL) {                /* for each instr active:  */
          INSDS *nxt = ip->nxtact;
          csound->pds = (OPDS*) ip;
          while ((csound->pds = csound->pds->nxtp) != NULL) {
            (*csound->pds->opadr)(csound, csound->pds); /* run each opcode 
*/
          }
          ip = nxt; /* but this does not allow for all deletions */
        }
      }
    }

to

    csound->spoutactive = 0;
    for (activeInstrument = csound->actanchor.nxtact;
         activeInstrument;
         activeInstrument = activeInstrument->nxtact) {
      for (csound->pds = activeInstrument->nxtp;
           csound->pds;
           csound->pds = csound->pds->nxtp) {
         (*csound->pds->opadr)(csound, csound->pds);
      }
    }

I haven't booked this in, mainly because I don't understand the comment "but 
this does not allow for all deletions." Are some of these two-step 
assignments necessary in some cases that I didn't test, or is it just 
unexamined code?

I don't have the stomach to go through the opcodes as a whole, but I am 
going to take a look at the oscillator indexing and interpolation, and at 
the krate/arate arithmetic opcodes. Although of course these are very 
straightforward macros and should be extremely efficient, I am interested in 
trying ATLAS, which is a version of BLAS that compiles to take every 
possible advantage of the processor. These arithmetic opcodes are used all 
over the place and even a small speedup would be helpful.

However, even if ATLAS produces some sort of speedup, it doesn't look like 
Csound is going to get 2 x faster this way. It's already pretty well 
optimized.

And that, in turn, reinforces the conclusion that parallizing synthesis is 
the only way to get real speedups.

Regards,
Mike



----- Original Message ----- 
From: "Michael Gogins" 
To: "Developer discussions" 
Sent: Wednesday, April 16, 2008 11:22 PM
Subject: Re: [Cs-dev] Vectorization


> I've done a bit of profiling; gprof is not completely satisfactory, since 
> I
> am used to sunstudio on Solaris, so far superior it makes one shake one's
> head. On Solaris, the Sun compiler automatically builds code for 
> profiling,
> and samples code by line from the kernel to give complete timing for each
> line of source code, as well as the call graphs supported by gprof. This 
> is
> very useful...
>
> At any rate, it's clear enough that almost all time in csound
> examples/xanadu.csd is going into kperf and opcodes called by kperf, and
> quite negligible amounts into reading orc and sco, printing messages, or
> writing the output soundfile.
>
> kperf in itself takes about 23% of the time not counting its callees, and 
> in
> turn fairly evenly divides its callees' time between the opcodes, which
> include not only pluck, klinseg, oscillators, and delay lines, but also
> krate-arate, arate-krate, and arate-arate arithmetic.
>
> Since the kperf loop looks pretty minimal, I have no idea why its own code
> should be eating 23% of the performance time. That is my major question at
> this point. I may need to rewrite kperf as a set of functions to obtain a
> more detailed breakdown of this time. Or I can develop some insight in the
> debugger.
>
> On Linux, you can rebuild the kernel and use oprofile, which can do what
> sunstudio can do, and give line by line timings inside functions.
>
> I will also profile Trapped in Convert since it performs many more short
> events, in comparison with Xanadu which performs a few long guitar-like
> chords.
>
> Regards,
> Mike
>
> ----- Original Message ----- 
> From: "Steven Yi" 
> To: "Developer discussions" 
> Sent: Wednesday, April 16, 2008 8:21 PM
> Subject: Re: [Cs-dev] Vectorization
>
>
>> Hi Victor,
>>
>> I haven't done it in a long time, but I found an email I posted to the
>> dev list a year and a half ago that has some instructions:
>>
>> http://www.nabble.com/Call-Graph-%28Postscript-file%29-td5730328.html#a5730328
>>
>> I haven't run it in a long while but those commands that were run
>> there should do the trick.
>>
>> steven
>>
>> On Wed, Apr 16, 2008 at 12:33 PM, victor  
>> wrote:
>>> I built csound with useGprof=1, run a simple one-oscillator test, then
>>>  tried gprof and got an empty profile. Why is that?
>>>
>>>
>>>  ----- Original Message -----
>>>  From: "Steven Yi" 
>>>
>>> To: "Michael Gogins" ; "Developer discussions"
>>>  
>>>
>>>
>>> Sent: Wednesday, April 16, 2008 7:42 PM
>>>  Subject: Re: [Cs-dev] Vectorization
>>>
>>>
>>>  > Hi Michael,
>>>  >
>>>  > I'm interested in the profiling results; what profiler are you using
>>>  > by the way?  I remember doing profiling a while back using debug
>>>  > builds and gprof (the options for that are still in SConstruct), but
>>>  > when I did profiling it was more for to try to figure out why the
>>>  > Pinkston FM model was really slow at the time, not so much for 
>>> general
>>>  > Csound performance.
>>>  >
>>>  > I've poked around most of Csound's engine and feel like I know it
>>>  > pretty well, especially when I was more actively working on the new
>>>  > parser.  If there's any questions that arise about Csound internals
>>>  > and how things are allocated, I'd suggest looking at the new parser
>>>  > (well, the compile part that translates the AST to the INSTRTXT data
>>>  > structs that Csound uses at performance time) to see how things are
>>>  > built, as to me it's a bit clearer than reading the old parser code.
>>>  >
>>>  > steven
>>>  >
>>>  >
>>>  > On Wed, Apr 16, 2008 at 11:27 AM, Michael Gogins 
>>> 
>>>  > wrote:
>>>  >> Yes, this is interesting. I have not yet tried unsafe math
>>> optimizations,
>>>  >> but will.
>>>  >>
>>>  >>  I have tried inlining as much code as possible, which in practice
>>> means
>>>  >> defining all C++ member functions in the header file. That
>>> consistently
>>>  >> produces somewhere between 5% and 15% speedups, right there.
>>>  >>
>>>  >>  This in is the context of intermittently continuing development of
>>>  >> Silence, an algorithmic composition/software synthesis library of my
>>> own
>>>  >> design. Currently, Silence renders audio just slightly faster than
>>>  >> Csound, but this is with hard-coded STK Rhodey C++ instruments. (But
>>>  >> then, my comparison Csound instrument uses the STK Rhodey opcode
>>> also, so
>>>  >> the comparison is more fair than it might seem: it mostly compares
>>>  >> instrument allocation and event dispatching). Dynamically defined
>>>  >> instruments will be slower in Silence, perhaps also in Csound (i.e.,
>>>  >> using more opcodes in the instr block instead of just calling one
>>> opcode
>>>  >> that does all the work). I shall soon know more, as I have finalized
>>> my
>>>  >> design of dynamically defined instruments and unit generators in
>>> Silence.
>>>  >>
>>>  >>  It is the usefulness of the profiler in getting this performance
>>> that
>>>  >> has decided me to profile Csound. I am very curious to see how much
>>>  >> "slack" there is, how much scope for performance improvements. Most
>>> of
>>>  >> Csound's opcodes code looks to be quite efficient, and I am 
>>> certainly
>>> not
>>>  >> going to monkey with the fundamental design of the engine, so I am
>>> most
>>>  >> curious about the efficiency of the engine implementation, 
>>> especially
>>> the
>>>  >> kperf loop, event initializers, output drivers, and so on. I don't,
>>>  >> actually, expect much slack but the profiler will, perhaps, point 
>>> out
>>> a
>>>  >> few areas that have not been completely thought through.
>>>  >>
>>>  >>  Regards,
>>>  >>  Mike
>>>  >>
>>>  >>
>>>  >>
>>>  >>  -----Original Message-----
>>>  >>  >From: Steven Yi 
>>>  >>  >Sent: Apr 16, 2008 12:00 PM
>>>  >>  >To: Developer discussions 
>>>  >>  >Subject: [Cs-dev] Vectorization
>>>  >>  >
>>>  >>  >Hi All,
>>>  >>  >
>>>  >>  >There's an interesting thread going on on linux-audio-dev about 
>>> the
>>>  >>  >performance of gcc vectorization code:
>>>  >>  >
>>>  >>  >http://www.nabble.com/vectorization-td15339532.html#a16720581
>>>  >>  >
>>>  >>  >The thread started in February but resumed a day or two ago, with
>>>  >>  >someone reporting better results with gcc than using assembly.
>>>  >>  >
>>>  >>  >steven
>>>  >>  >
>>>  >>
>>>  >>
>>> 
>>>  >-------------------------------------------------------------------------
>>>  >>  >This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>  >>  >Don't miss this year's exciting event. There's still time to save
>>> $100.
>>>  >>  >Use priority code J8TL2D2.
>>>  >>
>>>  >>
>>> 
>>>  >http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>  >>  >_______________________________________________
>>>  >>  >Csound-devel mailing list
>>>  >>  >Csound-devel@lists.sourceforge.net
>>>  >>  >https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>  >>
>>>  >>
>>>  >>
>>>  >>
>>>
>>> 
>>>  >>  -------------------------------------------------------------------------
>>>  >>  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>  >>  Don't miss this year's exciting event. There's still time to save
>>> $100.
>>>  >>  Use priority code J8TL2D2.
>>>  >>
>>>  >>
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>  >>  _______________________________________________
>>>  >>  Csound-devel mailing list
>>>  >>  Csound-devel@lists.sourceforge.net
>>>  >>  https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>  >>
>>>  >
>>>  > -------------------------------------------------------------------------
>>>  > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>  > Don't miss this year's exciting event. There's still time to save
>>> $100.
>>>  > Use priority code J8TL2D2.
>>>  >
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>  > _______________________________________________
>>>  > Csound-devel mailing list
>>>  > Csound-devel@lists.sourceforge.net
>>>  > https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>
>>>
>>>  -------------------------------------------------------------------------
>>>  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>  Don't miss this year's exciting event. There's still time to save $100.
>>>  Use priority code J8TL2D2.
>>>
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>  _______________________________________________
>>>  Csound-devel mailing list
>>>  Csound-devel@lists.sourceforge.net
>>>  https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>> Don't miss this year's exciting event. There's still time to save $100.
>> Use priority code J8TL2D2.
>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>> _______________________________________________
>> Csound-devel mailing list
>> Csound-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Csound-devel mailing list
> Csound-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/csound-devel 


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net