[Csnd] Re: Using python with CSOUND for Livecoding like Supercollider

Date	2010-07-09 19:03
From	Jeff Taylor
Subject	[Csnd] Re: Using python with CSOUND for Livecoding like Supercollider
	Regarding the maximum number of threads being executed, does that mean this new version of csound just splits the instruments between the available threads (so if you have 4 available threads and 40 executing instruments then it puts 10 into each thread) or does it create 40 threads and execute them in sequence 4 at a time? It seems like the former could be slower, since if the execution times of the instruments are very imbalanced, one of the threads could be loaded with all of the slow code and the others loaded with the fast. The second method allows threads to be processed as soon as they can. -- Electronically, Jeff Taylor On Fri, Jul 9, 2010 at 9:45 AM, Michael Gogins <michael.gogins@gmail.com> wrote: Yes, you need multiple cores to gain any benefit. With 1 core it costs more than it's worth. There would be no side effects on other processes. The maximum number of threads that can actually run at the same time is the number of CPUs (cores), except that with recent Intel CPUs, there is a feature called "hyperthreading" that enables 1 core to run 2 threads "at the same time". So for example on an Intel Core i7, which has 4 CPUs, there are actually 8 threads available to work. This is considerably more power than personal computers have ever had before. And now you can even get a Core i7 with 6 cores. None of these methods enable multiple threads within an instrument block. First, it is more difficult. Second, there would not be as much benefit because you have to be able run several thousand machine instructions before you can afford to suspend or resume a thread. Instrument blocks would run from several hundred to hundreds of thousands of machine instructions. As long as there are more voices than cores, the voice is a very appropriate level of threading granularity. However, to achieve your idea about parallel chains, you could break up the instrument into parts, each part a separate instrument block, and use the signal flow graph opcodes to connect the parts. Then the parts can indeed run in parallel. But this would also require the ParCS code to guard the writes to the signal flow graph inlets. Regards, Mike

Date	2010-07-09 21:31
From	Michael Gogins
Subject	[Csnd] Re: Re: Using python with CSOUND for Livecoding like Supercollider
	Remember that a thread consists of a "snapshot" of the current state of a program. This more or less consists of (a) the data in all CPU registers including the instruction pointer and the stack pointer, (b) the actual stack, and (c) perhaps some thread-local or cache-specific data. The CPU registers can be saved to an area of memory, or restored from an area of memory, with a single instruction (context switch). Basically, each thread is allowed to run for one "quantum" of time (a few milliseconds to a few tens of milliseconds if not interrupted) before being suspended in favor of another thread. At bottom, the "quanta" are what is really happening on the computer. Each one is a time-slice of a thread. On a 1 core computer, there is 1 quantum at a time for however many threads you have. On a 4 core computer, there are 4 quanta at a time (or 8, with hyperthreading) for however many threads you have. Typically, a modern PC will be running hundreds threads. On Windows, if you have Spy++, you can see them all by selecting Process view and expanding the entire tree. Or you can use the Task Manager, and add the Thread Count column. To go into even more grisly detail, the operating system has a list of threads. Assuming there are no interrupts, the operating system restores (or loads, if the program is starting up) the thread state for the first thread into a core's registers and runs it for 1 quantum of time. Then it saves the registers for that thread, and restores the state from the next thread to the registers and runs THAT thread for 1 quantum of time. If there are N cores, the operating system can resume and suspend N threads for N quanta at more or less the same time. When the OS gets the end of the list, it just goes back to the top and starts all over again ("round robin scheduling"). This is of course complicated by thread priorities and interrrupts from device drivers saying "run my quantum RIGHT NOW BEFORE MY DATA IS OUT OF DATE!!". Not to mention programs starting and stopping. In other words, there are actually several levels of concurrency. So to answer your question: Csound divides up your 40 voices among its 4 threads, and then the operating system slices up the 4 Csound threads and runs them on its 4 (or 8) cores along with slices of all the other threads on the machine. Hope this helps, Mike On Fri, Jul 9, 2010 at 2:03 PM, Jeff Taylor wrote: > Regarding the maximum number of threads being executed, does that mean this > new version of csound just splits the instruments between the available > threads (so if you have 4 available threads and 40 executing instruments > then it puts 10 into each thread) or does it create 40 threads and execute > them in sequence 4 at a time? > > It seems like the former could be slower, since if the execution times of > the instruments are very imbalanced, one of the threads could be loaded with > all of the slow code and the others loaded with the fast. The second method > allows threads to be processed as soon as they can. > > -- > Electronically, > Jeff Taylor > > > On Fri, Jul 9, 2010 at 9:45 AM, Michael Gogins > wrote: >> >> Yes, you need multiple cores to gain any benefit. With 1 core it costs >> more than it's worth. >> >> There would be no side effects on other processes. >> >> The maximum number of threads that can actually run at the same time >> is the number of CPUs (cores), except that with recent Intel CPUs, >> there is a feature called "hyperthreading" that enables 1 core to run >> 2 threads "at the same time". So for example on an Intel Core i7, >> which has 4 CPUs, there are actually 8 threads available to work. This >> is considerably more power than personal computers have ever had >> before. And now you can even get a Core i7 with 6 cores. >> >> None of these methods enable multiple threads within an instrument >> block. First, it is more difficult. Second, there would not be as much >> benefit because you have to be able run several thousand machine >> instructions before you can afford to suspend or resume a thread. >> Instrument blocks would run from several hundred to hundreds of >> thousands of machine instructions. As long as there are more voices >> than cores, the voice is a very appropriate level of threading >> granularity. >> >> However, to achieve your idea about parallel chains, you could break >> up the instrument into parts, each part a separate instrument block, >> and use the signal flow graph opcodes to connect the parts. Then the >> parts can indeed run in parallel. But this would also require the >> ParCS code to guard the writes to the signal flow graph inlets. >> >> Regards, >> Mike >> >> >> > > -- Michael Gogins Irreducible Productions http://www.michael-gogins.com Michael dot Gogins at gmail dot com Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"

Date	2010-07-11 13:05
From	Jeff Taylor
Subject	[Csnd] Re: Re: Re: Using python with CSOUND for Livecoding like Supercollider
	Yes, it does help. Thank you for the detailed description. I have one more question, though. The only way I can see this working while still preserving the instrument execution order is that csound threads all of the voices for instrument one, waits for them to finish, then threads all of the voices for instrument two, waits for them to finish, etc. Which would mean that if you had a piece composed using many instruments with only one execution for each at a single time then the multi-threading wouldn't give any benefit. Do I understand this correctly? -- Electronically, Jeff Taylor On Fri, Jul 9, 2010 at 3:31 PM, Michael Gogins <michael.gogins@gmail.com> wrote: Remember that a thread consists of a "snapshot" of the current state of a program. This more or less consists of (a) the data in all CPU registers including the instruction pointer and the stack pointer, (b) the actual stack, and (c) perhaps some thread-local or cache-specific data. The CPU registers can be saved to an area of memory, or restored from an area of memory, with a single instruction (context switch). Basically, each thread is allowed to run for one "quantum" of time (a few milliseconds to a few tens of milliseconds if not interrupted) before being suspended in favor of another thread. At bottom, the "quanta" are what is really happening on the computer. Each one is a time-slice of a thread. On a 1 core computer, there is 1 quantum at a time for however many threads you have. On a 4 core computer, there are 4 quanta at a time (or 8, with hyperthreading) for however many threads you have. Typically, a modern PC will be running hundreds threads. On Windows, if you have Spy++, you can see them all by selecting Process view and expanding the entire tree. Or you can use the Task Manager, and add the Thread Count column. To go into even more grisly detail, the operating system has a list of threads. Assuming there are no interrupts, the operating system restores (or loads, if the program is starting up) the thread state for the first thread into a core's registers and runs it for 1 quantum of time. Then it saves the registers for that thread, and restores the state from the next thread to the registers and runs THAT thread for 1 quantum of time. If there are N cores, the operating system can resume and suspend N threads for N quanta at more or less the same time. When the OS gets the end of the list, it just goes back to the top and starts all over again ("round robin scheduling"). This is of course complicated by thread priorities and interrrupts from device drivers saying "run my quantum RIGHT NOW BEFORE MY DATA IS OUT OF DATE!!". Not to mention programs starting and stopping. In other words, there are actually several levels of concurrency. So to answer your question: Csound divides up your 40 voices among its 4 threads, and then the operating system slices up the 4 Csound threads and runs them on its 4 (or 8) cores along with slices of all the other threads on the machine. Hope this helps, Mike

Date	2010-07-11 13:40
From	jpff@cs.bath.ac.uk
Subject	[Csnd] Re: Re: Re: Re: Using python with CSOUND for Livecoding like Supercollider
	> Yes, it does help. Thank you for the detailed description. > > I have one more question, though. The only way I can see this working > while > still preserving the instrument execution order is that csound threads all > of the voices for instrument one, waits for them to finish, then threads > all > of the voices for instrument two, waits for them to finish, etc. Which > would mean that if you had a piece composed using many instruments with > only > one execution for each at a single time then the multi-threading wouldn't > give any benefit. Do I understand this correctly? > > -- > Electronically, > Jeff Taylor > The instances of the instruments are partially ordered by the semantics or instrument order and of data flowing between instruments. That means that at every k-cycle we can allocate instruments to threads will there are no precursors. Yes if yoy deliberately serialise your instrument sby passing informstion it will gain nothing, but if instr2 has no dependanccy of instr 1 they can run in parallel, despite the instrument-order semantics. Actaully if instrument 90 is independeent of instru,ments 1-89 it can run first if necessary. See the pape rion Linuc Audio in 2009 for mire detail, or the poster in ICMC Montreal 2009 ==John ff Send bugs reports to the Sourceforge bug tracker https://sourceforge.net/tracker/?group_id=81968&atid=564599 Discussions of bugs and features can be posted here To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"