Csound Csound-dev Csound-tekno Search About

[Csnd] Re: Using python with CSOUND for Livecoding like Supercollider

Date2010-07-09 19:03
FromJeff Taylor
Subject[Csnd] Re: Using python with CSOUND for Livecoding like Supercollider
Regarding the maximum number of threads being executed, does that mean this new version of csound just splits the instruments between the available threads (so if you have 4 available threads and 40 executing instruments then it puts 10 into each thread) or does it create 40 threads and execute them in sequence 4 at a time?

It seems like the former could be slower, since if the execution times of the instruments are very imbalanced, one of the threads could be loaded with all of the slow code and the others loaded with the fast.  The second method allows threads to be processed as soon as they can.

--
Electronically,
Jeff Taylor


On Fri, Jul 9, 2010 at 9:45 AM, Michael Gogins <michael.gogins@gmail.com> wrote:
Yes, you need multiple cores to gain any benefit. With 1 core it costs
more than it's worth.

There would be no side effects on other processes.

The maximum number of threads that can actually run at the same time
is the number of CPUs (cores), except that with recent Intel CPUs,
there is a feature called "hyperthreading" that enables 1 core to run
2 threads "at the same time". So for example on an Intel Core i7,
which has 4 CPUs, there are actually 8 threads available to work. This
is considerably more power than personal computers have ever had
before. And now you can even get a Core i7 with 6 cores.

None of these methods enable multiple threads within an instrument
block. First, it is more difficult. Second, there would not be as much
benefit because you have to be able run several thousand machine
instructions before you can afford to suspend or resume a thread.
Instrument blocks would run from several hundred to hundreds of
thousands of machine instructions. As long as there are more voices
than cores, the voice is a very appropriate level of threading
granularity.

However, to achieve your idea about parallel chains, you could break
up the instrument into parts, each part a separate instrument block,
and use the signal flow graph opcodes to connect the parts. Then the
parts can indeed run in parallel. But this would also require the
ParCS code to guard the writes to the signal flow graph inlets.

Regards,
Mike





Date2010-07-09 21:31
FromMichael Gogins
Subject[Csnd] Re: Re: Using python with CSOUND for Livecoding like Supercollider
Remember that a thread consists of a "snapshot" of the current state
of a program. This more or less consists of (a) the data in all CPU
registers including the instruction pointer and the stack pointer, (b)
the actual stack, and (c) perhaps some thread-local or cache-specific
data. The CPU registers can be saved to an area of memory, or restored
from an area of memory, with a single instruction (context switch).
Basically, each thread is allowed to run for one "quantum" of time (a
few milliseconds to a few tens of milliseconds if not interrupted)
before being suspended in favor of another thread.

At bottom, the "quanta" are what is really happening on the computer.
Each one is a time-slice of a thread. On a 1 core computer, there is 1
quantum at a time for however many threads you have. On a 4 core
computer, there are 4 quanta at a time (or 8, with hyperthreading) for
however many threads you have. Typically, a modern PC will be running
hundreds threads. On Windows, if you have Spy++, you can see them all
by selecting Process view and expanding the entire tree. Or you can
use the Task Manager, and add the Thread Count column.

To go into even more grisly detail, the operating system has a list of
threads. Assuming there are no interrupts, the operating system
restores (or loads, if the program is starting up) the thread state
for the first thread into a core's registers and runs it for 1 quantum
of time. Then it saves the registers for that thread, and restores the
state from the next thread to the registers and runs THAT thread for 1
quantum of time. If there are N cores, the operating system can resume
and suspend N threads for N quanta at more or less the same time. When
the OS gets the end of the list, it just goes back to the top and
starts all over again ("round robin scheduling"). This is of course
complicated by thread priorities and interrrupts from device drivers
saying "run my quantum RIGHT NOW BEFORE MY DATA IS OUT OF DATE!!". Not
to mention programs starting and stopping.

In other words, there are actually several levels of concurrency. So
to answer your question: Csound divides up your 40 voices among its 4
threads, and then the operating system slices up the 4 Csound threads
and runs them on its 4 (or 8) cores along with slices of all the other
threads on the machine.

Hope this helps,
Mike

On Fri, Jul 9, 2010 at 2:03 PM, Jeff Taylor  wrote:
> Regarding the maximum number of threads being executed, does that mean this
> new version of csound just splits the instruments between the available
> threads (so if you have 4 available threads and 40 executing instruments
> then it puts 10 into each thread) or does it create 40 threads and execute
> them in sequence 4 at a time?
>
> It seems like the former could be slower, since if the execution times of
> the instruments are very imbalanced, one of the threads could be loaded with
> all of the slow code and the others loaded with the fast.  The second method
> allows threads to be processed as soon as they can.
>
> --
> Electronically,
> Jeff Taylor
>
>
> On Fri, Jul 9, 2010 at 9:45 AM, Michael Gogins 
> wrote:
>>
>> Yes, you need multiple cores to gain any benefit. With 1 core it costs
>> more than it's worth.
>>
>> There would be no side effects on other processes.
>>
>> The maximum number of threads that can actually run at the same time
>> is the number of CPUs (cores), except that with recent Intel CPUs,
>> there is a feature called "hyperthreading" that enables 1 core to run
>> 2 threads "at the same time". So for example on an Intel Core i7,
>> which has 4 CPUs, there are actually 8 threads available to work. This
>> is considerably more power than personal computers have ever had
>> before. And now you can even get a Core i7 with 6 cores.
>>
>> None of these methods enable multiple threads within an instrument
>> block. First, it is more difficult. Second, there would not be as much
>> benefit because you have to be able run several thousand machine
>> instructions before you can afford to suspend or resume a thread.
>> Instrument blocks would run from several hundred to hundreds of
>> thousands of machine instructions. As long as there are more voices
>> than cores, the voice is a very appropriate level of threading
>> granularity.
>>
>> However, to achieve your idea about parallel chains, you could break
>> up the instrument into parts, each part a separate instrument block,
>> and use the signal flow graph opcodes to connect the parts. Then the
>> parts can indeed run in parallel. But this would also require the
>> ParCS code to guard the writes to the signal flow graph inlets.
>>
>> Regards,
>> Mike
>>
>>
>>
>
>



-- 
Michael Gogins
Irreducible Productions
http://www.michael-gogins.com
Michael dot Gogins at gmail dot com


Send bugs reports to the Sourceforge bug tracker
            https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"


Date2010-07-11 13:05
FromJeff Taylor
Subject[Csnd] Re: Re: Re: Using python with CSOUND for Livecoding like Supercollider
Yes, it does help.  Thank you for the detailed description.

I have one more question, though.  The only way I can see this working while still preserving the instrument execution order is that csound threads all of the voices for instrument one, waits for them to finish, then threads all of the voices for instrument two, waits for them to finish, etc.  Which would mean that if you had a piece composed using many instruments with only one execution for each at a single time then the multi-threading wouldn't give any benefit.  Do I understand this correctly?

--
Electronically,
Jeff Taylor


On Fri, Jul 9, 2010 at 3:31 PM, Michael Gogins <michael.gogins@gmail.com> wrote:
Remember that a thread consists of a "snapshot" of the current state
of a program. This more or less consists of (a) the data in all CPU
registers including the instruction pointer and the stack pointer, (b)
the actual stack, and (c) perhaps some thread-local or cache-specific
data. The CPU registers can be saved to an area of memory, or restored
from an area of memory, with a single instruction (context switch).
Basically, each thread is allowed to run for one "quantum" of time (a
few milliseconds to a few tens of milliseconds if not interrupted)
before being suspended in favor of another thread.

At bottom, the "quanta" are what is really happening on the computer.
Each one is a time-slice of a thread. On a 1 core computer, there is 1
quantum at a time for however many threads you have. On a 4 core
computer, there are 4 quanta at a time (or 8, with hyperthreading) for
however many threads you have. Typically, a modern PC will be running
hundreds threads. On Windows, if you have Spy++, you can see them all
by selecting Process view and expanding the entire tree. Or you can
use the Task Manager, and add the Thread Count column.

To go into even more grisly detail, the operating system has a list of
threads. Assuming there are no interrupts, the operating system
restores (or loads, if the program is starting up) the thread state
for the first thread into a core's registers and runs it for 1 quantum
of time. Then it saves the registers for that thread, and restores the
state from the next thread to the registers and runs THAT thread for 1
quantum of time. If there are N cores, the operating system can resume
and suspend N threads for N quanta at more or less the same time. When
the OS gets the end of the list, it just goes back to the top and
starts all over again ("round robin scheduling"). This is of course
complicated by thread priorities and interrrupts from device drivers
saying "run my quantum RIGHT NOW BEFORE MY DATA IS OUT OF DATE!!". Not
to mention programs starting and stopping.

In other words, there are actually several levels of concurrency. So
to answer your question: Csound divides up your 40 voices among its 4
threads, and then the operating system slices up the 4 Csound threads
and runs them on its 4 (or 8) cores along with slices of all the other
threads on the machine.

Hope this helps,
Mike




Date2010-07-11 13:40
Fromjpff@cs.bath.ac.uk
Subject[Csnd] Re: Re: Re: Re: Using python with CSOUND for Livecoding like Supercollider
> Yes, it does help.  Thank you for the detailed description.
>
> I have one more question, though.  The only way I can see this working
> while
> still preserving the instrument execution order is that csound threads all
> of the voices for instrument one, waits for them to finish, then threads
> all
> of the voices for instrument two, waits for them to finish, etc.  Which
> would mean that if you had a piece composed using many instruments with
> only
> one execution for each at a single time then the multi-threading wouldn't
> give any benefit.  Do I understand this correctly?
>
> --
> Electronically,
> Jeff Taylor
>

The instances of the instruments are partially ordered by the semantics or
instrument order and of data flowing between instruments.  That means that
at every k-cycle we can allocate instruments to threads will there are no
precursors.  Yes if yoy deliberately serialise your instrument sby passing
informstion it will gain nothing, but if instr2 has no dependanccy of
instr 1 they can run in parallel, despite the instrument-order semantics. 
Actaully if instrument 90 is independeent of instru,ments 1-89 it can run
first if necessary.

See the pape rion Linuc Audio in 2009 for mire detail, or the poster in
ICMC Montreal 2009

==John ff



Send bugs reports to the Sourceforge bug tracker
            https://sourceforge.net/tracker/?group_id=81968&atid=564599
Discussions of bugs and features can be posted here
To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"