Csound Csound-dev Csound-tekno Search About

Re: [Cs-dev] Vectorization

Date2008-04-18 02:28
From"Michael Gogins"
SubjectRe: [Cs-dev] Vectorization
I take it all back! I was comparing kperf in Xanadu (ksmps = 1) with kperf 
in Trapped (ksmps = 10), by mistake.

The change I discuss below actually made only a 1%-2% improvement in 
efficiency! That would be down to removing unnecessary assignments; the 
compiler may have optimized some out already anyway.

All the more reason to focus on parallelizing...

Embarrasedly,
Mike

----- Original Message ----- 
From: "Michael Gogins" 
To: "Developer discussions" 
Sent: Thursday, April 17, 2008 9:17 PM
Subject: Re: [Cs-dev] Vectorization


> Silly me... xanadu.csd runs at ksmps = 1. So, for ksmps greater than one,
> the below does not quite apply.
>
> Still, I was able to reduce the % of time in kperf code in this case (i.e.
> xanadu.csd with ksmps = 1) from 23.37% to 12.37%, in short I virtually
> doubled the efficiency of Csound's inner loop. Don't get all excited, this
> doesn't count the functions called by kperf, i.e., the opcodes themselves,
> which consume the vast majority of time in Csound.
>
> I did this by removing the threading code, removing what appeared to be
> unncessary temporary variables and assignments, and recasting the while
> loops as for loops. The innermost loop changed from
>
>    csound->spoutactive = 0;            /*   make spout inactive   */
>    barrier1 = csound->multiThreadedBarrier1;
>    barrier2 = csound->multiThreadedBarrier2;
>    ip = csound->actanchor.nxtact;
>    if (ip != NULL) {
>      csound->multiThreadedStart = ip;
>      if (csound->multiThreadedThreadInfo != NULL) {
>        while (csound->multiThreadedStart != NULL) {
>          INSDS *current = csound->multiThreadedStart;
>          while(current != NULL &&
>                (current->insno == csound->multiThreadedStart->insno)) {
>            current = current->nxtact;
>          }
>          csound->multiThreadedEnd = current;
>                csound->WaitBarrier(barrier1);
>                 csound->WaitBarrier(barrier2);
>                csound->multiThreadedStart = current;
>            }
>        }
>      else {
>        while (ip != NULL) {                /* for each instr active:  */
>          INSDS *nxt = ip->nxtact;
>          csound->pds = (OPDS*) ip;
>          while ((csound->pds = csound->pds->nxtp) != NULL) {
>            (*csound->pds->opadr)(csound, csound->pds); /* run each opcode
> */
>          }
>          ip = nxt; /* but this does not allow for all deletions */
>        }
>      }
>    }
>
> to
>
>    csound->spoutactive = 0;
>    for (activeInstrument = csound->actanchor.nxtact;
>         activeInstrument;
>         activeInstrument = activeInstrument->nxtact) {
>      for (csound->pds = activeInstrument->nxtp;
>           csound->pds;
>           csound->pds = csound->pds->nxtp) {
>         (*csound->pds->opadr)(csound, csound->pds);
>      }
>    }
>
> I haven't booked this in, mainly because I don't understand the comment 
> "but
> this does not allow for all deletions." Are some of these two-step
> assignments necessary in some cases that I didn't test, or is it just
> unexamined code?
>
> I don't have the stomach to go through the opcodes as a whole, but I am
> going to take a look at the oscillator indexing and interpolation, and at
> the krate/arate arithmetic opcodes. Although of course these are very
> straightforward macros and should be extremely efficient, I am interested 
> in
> trying ATLAS, which is a version of BLAS that compiles to take every
> possible advantage of the processor. These arithmetic opcodes are used all
> over the place and even a small speedup would be helpful.
>
> However, even if ATLAS produces some sort of speedup, it doesn't look like
> Csound is going to get 2 x faster this way. It's already pretty well
> optimized.
>
> And that, in turn, reinforces the conclusion that parallizing synthesis is
> the only way to get real speedups.
>
> Regards,
> Mike
>
>
>
> ----- Original Message ----- 
> From: "Michael Gogins" 
> To: "Developer discussions" 
> Sent: Wednesday, April 16, 2008 11:22 PM
> Subject: Re: [Cs-dev] Vectorization
>
>
>> I've done a bit of profiling; gprof is not completely satisfactory, since
>> I
>> am used to sunstudio on Solaris, so far superior it makes one shake one's
>> head. On Solaris, the Sun compiler automatically builds code for
>> profiling,
>> and samples code by line from the kernel to give complete timing for each
>> line of source code, as well as the call graphs supported by gprof. This
>> is
>> very useful...
>>
>> At any rate, it's clear enough that almost all time in csound
>> examples/xanadu.csd is going into kperf and opcodes called by kperf, and
>> quite negligible amounts into reading orc and sco, printing messages, or
>> writing the output soundfile.
>>
>> kperf in itself takes about 23% of the time not counting its callees, and
>> in
>> turn fairly evenly divides its callees' time between the opcodes, which
>> include not only pluck, klinseg, oscillators, and delay lines, but also
>> krate-arate, arate-krate, and arate-arate arithmetic.
>>
>> Since the kperf loop looks pretty minimal, I have no idea why its own 
>> code
>> should be eating 23% of the performance time. That is my major question 
>> at
>> this point. I may need to rewrite kperf as a set of functions to obtain a
>> more detailed breakdown of this time. Or I can develop some insight in 
>> the
>> debugger.
>>
>> On Linux, you can rebuild the kernel and use oprofile, which can do what
>> sunstudio can do, and give line by line timings inside functions.
>>
>> I will also profile Trapped in Convert since it performs many more short
>> events, in comparison with Xanadu which performs a few long guitar-like
>> chords.
>>
>> Regards,
>> Mike
>>
>> ----- Original Message ----- 
>> From: "Steven Yi" 
>> To: "Developer discussions" 
>> Sent: Wednesday, April 16, 2008 8:21 PM
>> Subject: Re: [Cs-dev] Vectorization
>>
>>
>>> Hi Victor,
>>>
>>> I haven't done it in a long time, but I found an email I posted to the
>>> dev list a year and a half ago that has some instructions:
>>>
>>> http://www.nabble.com/Call-Graph-%28Postscript-file%29-td5730328.html#a5730328
>>>
>>> I haven't run it in a long while but those commands that were run
>>> there should do the trick.
>>>
>>> steven
>>>
>>> On Wed, Apr 16, 2008 at 12:33 PM, victor 
>>> wrote:
>>>> I built csound with useGprof=1, run a simple one-oscillator test, then
>>>>  tried gprof and got an empty profile. Why is that?
>>>>
>>>>
>>>>  ----- Original Message -----
>>>>  From: "Steven Yi" 
>>>>
>>>> To: "Michael Gogins" ; "Developer discussions"
>>>>  
>>>>
>>>>
>>>> Sent: Wednesday, April 16, 2008 7:42 PM
>>>>  Subject: Re: [Cs-dev] Vectorization
>>>>
>>>>
>>>>  > Hi Michael,
>>>>  >
>>>>  > I'm interested in the profiling results; what profiler are you using
>>>>  > by the way?  I remember doing profiling a while back using debug
>>>>  > builds and gprof (the options for that are still in SConstruct), but
>>>>  > when I did profiling it was more for to try to figure out why the
>>>>  > Pinkston FM model was really slow at the time, not so much for
>>>> general
>>>>  > Csound performance.
>>>>  >
>>>>  > I've poked around most of Csound's engine and feel like I know it
>>>>  > pretty well, especially when I was more actively working on the new
>>>>  > parser.  If there's any questions that arise about Csound internals
>>>>  > and how things are allocated, I'd suggest looking at the new parser
>>>>  > (well, the compile part that translates the AST to the INSTRTXT data
>>>>  > structs that Csound uses at performance time) to see how things are
>>>>  > built, as to me it's a bit clearer than reading the old parser code.
>>>>  >
>>>>  > steven
>>>>  >
>>>>  >
>>>>  > On Wed, Apr 16, 2008 at 11:27 AM, Michael Gogins
>>>> 
>>>>  > wrote:
>>>>  >> Yes, this is interesting. I have not yet tried unsafe math
>>>> optimizations,
>>>>  >> but will.
>>>>  >>
>>>>  >>  I have tried inlining as much code as possible, which in practice
>>>> means
>>>>  >> defining all C++ member functions in the header file. That
>>>> consistently
>>>>  >> produces somewhere between 5% and 15% speedups, right there.
>>>>  >>
>>>>  >>  This in is the context of intermittently continuing development of
>>>>  >> Silence, an algorithmic composition/software synthesis library of 
>>>> my
>>>> own
>>>>  >> design. Currently, Silence renders audio just slightly faster than
>>>>  >> Csound, but this is with hard-coded STK Rhodey C++ instruments. 
>>>> (But
>>>>  >> then, my comparison Csound instrument uses the STK Rhodey opcode
>>>> also, so
>>>>  >> the comparison is more fair than it might seem: it mostly compares
>>>>  >> instrument allocation and event dispatching). Dynamically defined
>>>>  >> instruments will be slower in Silence, perhaps also in Csound 
>>>> (i.e.,
>>>>  >> using more opcodes in the instr block instead of just calling one
>>>> opcode
>>>>  >> that does all the work). I shall soon know more, as I have 
>>>> finalized
>>>> my
>>>>  >> design of dynamically defined instruments and unit generators in
>>>> Silence.
>>>>  >>
>>>>  >>  It is the usefulness of the profiler in getting this performance
>>>> that
>>>>  >> has decided me to profile Csound. I am very curious to see how much
>>>>  >> "slack" there is, how much scope for performance improvements. Most
>>>> of
>>>>  >> Csound's opcodes code looks to be quite efficient, and I am
>>>> certainly
>>>> not
>>>>  >> going to monkey with the fundamental design of the engine, so I am
>>>> most
>>>>  >> curious about the efficiency of the engine implementation,
>>>> especially
>>>> the
>>>>  >> kperf loop, event initializers, output drivers, and so on. I don't,
>>>>  >> actually, expect much slack but the profiler will, perhaps, point
>>>> out
>>>> a
>>>>  >> few areas that have not been completely thought through.
>>>>  >>
>>>>  >>  Regards,
>>>>  >>  Mike
>>>>  >>
>>>>  >>
>>>>  >>
>>>>  >>  -----Original Message-----
>>>>  >>  >From: Steven Yi 
>>>>  >>  >Sent: Apr 16, 2008 12:00 PM
>>>>  >>  >To: Developer discussions 
>>>>  >>  >Subject: [Cs-dev] Vectorization
>>>>  >>  >
>>>>  >>  >Hi All,
>>>>  >>  >
>>>>  >>  >There's an interesting thread going on on linux-audio-dev about
>>>> the
>>>>  >>  >performance of gcc vectorization code:
>>>>  >>  >
>>>>  >>  >http://www.nabble.com/vectorization-td15339532.html#a16720581
>>>>  >>  >
>>>>  >>  >The thread started in February but resumed a day or two ago, with
>>>>  >>  >someone reporting better results with gcc than using assembly.
>>>>  >>  >
>>>>  >>  >steven
>>>>  >>  >
>>>>  >>
>>>>  >>
>>>>
>>>> 
>>>>  >-------------------------------------------------------------------------
>>>>  >>  >This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>>  >>  >Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>>  >>  >Use priority code J8TL2D2.
>>>>  >>
>>>>  >>
>>>>
>>>> 
>>>>  >http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>>  >>  >_______________________________________________
>>>>  >>  >Csound-devel mailing list
>>>>  >>  >Csound-devel@lists.sourceforge.net
>>>>  >>  >https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>  >>
>>>>  >>
>>>>  >>
>>>>  >>
>>>>
>>>>
>>>> 
>>>>  >>  -------------------------------------------------------------------------
>>>>  >>  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>>  >>  Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>>  >>  Use priority code J8TL2D2.
>>>>  >>
>>>>  >>
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>>  >>  _______________________________________________
>>>>  >>  Csound-devel mailing list
>>>>  >>  Csound-devel@lists.sourceforge.net
>>>>  >>  https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>  >>
>>>>  >
>>>>  > -------------------------------------------------------------------------
>>>>  > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>>  > Don't miss this year's exciting event. There's still time to save
>>>> $100.
>>>>  > Use priority code J8TL2D2.
>>>>  >
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>>  > _______________________________________________
>>>>  > Csound-devel mailing list
>>>>  > Csound-devel@lists.sourceforge.net
>>>>  > https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>
>>>>
>>>>  -------------------------------------------------------------------------
>>>>  This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>>>  Don't miss this year's exciting event. There's still time to save 
>>>> $100.
>>>>  Use priority code J8TL2D2.
>>>>
>>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>>>  _______________________________________________
>>>>  Csound-devel mailing list
>>>>  Csound-devel@lists.sourceforge.net
>>>>  https://lists.sourceforge.net/lists/listinfo/csound-devel
>>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>>> Don't miss this year's exciting event. There's still time to save $100.
>>> Use priority code J8TL2D2.
>>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>>> _______________________________________________
>>> Csound-devel mailing list
>>> Csound-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
>> Don't miss this year's exciting event. There's still time to save $100.
>> Use priority code J8TL2D2.
>> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>> _______________________________________________
>> Csound-devel mailing list
>> Csound-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/csound-devel
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Csound-devel mailing list
> Csound-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/csound-devel 


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net