| Thanks for your informative response. I have re-read parts of Wilson's
dissertation, and I have a few more questions and comments. Please
give them some thought.
Also, please let me know if there is any part of this work that you
would like me to take on, either in the ParCS code itself, or in
fixing bugs to free up time for yourself.
Once again, I think the dissertation is an impressive piece of work
and covers many cases not obvious when beginning to think about the
problem.
Question
--------
The dissertation documents significant speedups in pieces of interest
to me (heavy, mostly parallel instruments). Are we sure that the ParCS
code reflects all the speedups discussed in the dissertation? Several
alternative designs are discussed. Which one does the ParCS code
actually implement, or does the code follow a different design
altogether?
Comment
-------
Off the top of my head, I don't see any place to prefer mutexes to
spinlocks except while building the DAG.
Comment
-------
It is critical to find out if the ParCS code can be made fast enough.
This requires optimizing and debugging the code. If it is still not
fast enough, we need to think if another design might work, or if it
just not going to work well enough.
Comment
-------
What is "fast enough?" In my view it is fast enough if (a) complex
pieces run significantly faster with 2 or more cores, and (b)
performance shows some kind of scaling up with increasing numbers of
cores. I don't think it's fast enough if adding cores does no good or
even slows things down again.
Comment
-------
What would be an alternative design? It seems clear that more complex
designs would incur even more threading overhead, so we should think
about simpler designs. But they would have to throw out features of
ParCS.
The simplest design I can think of would start with Steven Yi's
approach. Currently it ignores global data and simply runs multiple
instances of a given template in parallel with a barrier at the end.
There is a spinlock or something to protect the out opcode buffers.
Period. To this could be added atomic operations or spinlocks
(whichever is most efficient) on all opcode buffers. I think these
would incur little overhead (but a lot of detail work to implement).
They would however throw out the ParCS feature of protecting the
operation of first writing to a global variable, then reading it
without interference from other threads.
In other words, the alternative would trade the DAG analysis followed
by protecting only global data at DAG edges, for no DAG analysis plus
protecting all global data, with the theory that instrument order
would guarantee few contests on global data operations and uncontested
operations would have very low cost.
The simpler design would throw out the DAG analysis and just use the
instrument number order to determine concurrent layers. This would
throw out some of the ParCS semantic analysis that is aimed at finding
potential speedups. But I'm not sure these potential speedups can be
made actual, because the DAG code introduces its own threading
overhead.
Question
--------
So, is the DAG analysis actually required to compute the correct
sound, or is it only aimed at finding more speedups in comparison with
just relying on instrument number?
Regards,
Mike
On Thu, Jul 15, 2010 at 5:26 AM, john ffitch wrote:
>>>>>> "Michael" == Michael Gogins writes:
>
> Michael> How big is the "waste" instrument?
>
> At a loop of 30,000 times it did not do much but at 100,000,000 it
> made a simple oscillator take a very long time.
>
> This is for a very short sound
> Elapse CPU
> 1 processor 13.837 13.83
> 2 processors 6.915 13.82
> 3 processors 6.92 13.85
> 4 processors 3.618 13.85
>
> There were 4 instruments playing, two oscillators and two plucks, and
> no common variables except spout, which is protected by a spinlock
>
> Michael> The granularity is affected both by pure instruction overhead and by
> Michael> other overheads involved in synchronizing and scheduling (threading
> Michael> overhead, if you will). With pure instruction overhead I would guess
> Michael> that a few thousand machine instructions will justify a thread context
> Michael> switch. I would also guess that's a bit bigger than a small instrument
> Michael> instance, i.e., the instruments need to be "smallish or larger" not
> Michael> "small". I'm confident of this because Steven Yi's code definitely got
> Michael> significant speedups with mid size instruments.
>
> There is code to could the "cost" of an instrument and balance one
> instr or a sequence against others
>
> Michael> Threading overhead might well throw that out the window. I would think
> Michael> that the design of synchronizing opcodes that access global data would
> Michael> introduce additional threading overhead and require many more machine
> Michael> instructions to balance out. If so then, for a given layer of
> Michael> concurrency (e.g. a set of instances of the same instrument template
> Michael> without any potential data races), running some of those instances in
> Michael> series should balance it out. If there are too many potential data
> Michael> races, i.e. too much synchronization in the opcode graphs, then it may
> Michael> never really work out.
>
> That is what needs to be viewed. At present I think some of the DAG
> dispatch is over complex and needs a close review
>
> Michael> Using spinlocks for opcode synchronization would almost certainly
> Michael> help, does ParCS do that? I don't recall...
>
> Most is spinlock and two barriers. There are a few mutex sections and
> I am not sure why -- one is for one read of a variable which is a
> classic spinlock case
>
> Michael> Does this make any sense?
>
> Sure. I need to finish my opcode cost counter to augment the parser,
> and look critically at the dag_comsume and dag_update code
>
> But clearing some of the bug list also needs to be done.
>
> Michael> Regards
> Michael> Mike
>
> ==John ffitch
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Csound-devel mailing list
> Csound-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/csound-devel
>
--
Michael Gogins
Irreducible Productions
http://www.michael-gogins.com
Michael dot Gogins at gmail dot com
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Csound-devel mailing list
Csound-devel@lists.sourceforge.net |