[Cs-dev] ParCS segfaults on Windows
Date | 2010-08-21 11:49 |
From | Michael Gogins |
Subject | [Cs-dev] ParCS segfaults on Windows |
Windows 7, MinGW gcc 4.5, fresh build from updated ParCS branch last night: Csound version 5.12 (double samples) Aug 20 2010 displays suppressed 0dBFS level = 32768.0 orch now loaded audio buffered in 128 sample-frame blocks writing 512-byte blks of shorts to Xanadu.wav (WAV) SECTION 1: ftable 1: ftable 2: ftable 3: new alloc for instr 1: new alloc for instr 3: new alloc for instr 3: new alloc for instr 3: new alloc for instr 3: new alloc for instr 3: new alloc for instr 3: Program received signal SIGSEGV, Segmentation fault. 0x624830cc in pthread_mutex_unlock () from C:\Windows\SysWOW64\pthreadGC2.dll (gdb) bt #0 0x624830cc in pthread_mutex_unlock () from C:\Windows\SysWOW64\pthreadGC2.dll #1 0x6f250e26 in csp_dag_consume (csound=0x5e0048, dag=0x6c6f50, node=0x28fe04, update_hdl=0x28fe00) at Engine\cs_par_dispatch.c:1487 #2 0x6f2515b9 in csp_dag_calculate_max_roots (csound=0x5e0048, dag=0x28fe6c, chain=0x69e348) at Engine\cs_par_dispatch.c:1374 #3 csp_dag_build (csound=0x5e0048, dag=0x28fe6c, chain=0x69e348) at Engine\cs_par_dispatch.c:1397 #4 0x6f252915 in csp_dag_cache_entry_alloc (csound=0x5e0048, dag=0x28febc, chain=0x69e348) at Engine\cs_par_dispatch.c:2413 #5 csp_dag_cache_fetch (csound=0x5e0048, dag=0x28febc, chain=0x69e348) at Engine\cs_par_dispatch.c:2561 #6 0x6f23a42a in kperf (csound=0x5e0048) at Top\csound.c:1402 #7 csoundPerform (csound=0x5e0048) at Top\csound.c:1555 #8 0x00401628 in main (argc=3, argv=0x741d40) at frontends\csound\csound_main.c:136 (gdb) -- Michael Gogins Irreducible Productions http://www.michael-gogins.com Michael dot Gogins at gmail dot com ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 09:56 |
From | john ffitch |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
I see you were running xanadu. How many processors/threads? I have not got a segfault at all recently. ==John ffitch ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 12:08 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
Attachments | None None |
Faults only on trapped. All recent tests 4 threads on 4 cores. MKG from cell phone On Aug 23, 2010 4:57 AM, "john ffitch" <jpff@codemist.co.uk> wrote: |
Date | 2010-08-23 14:04 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
I should add, my original segfault email was before changes you made that fixed the problem for xanadu.csd (and also apparently did other good stuff). But it remains for Trapped. I will investigate. Regards, Mike On Mon, Aug 23, 2010 at 7:08 AM, Michael Gogins |
Date | 2010-08-23 14:58 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
I can confirm xanadu runs here with ParCS on OSX too. With ksmps=100 and -j 2 on a dual core machine, I have it running in 1.738 secs, versus 2.210 secs 'ordinary' (HEAD branch) csound. It is definitely faster. Victor On 23 Aug 2010, at 14:04, Michael Gogins wrote: > I should add, my original segfault email was before changes you made > that fixed the problem for xanadu.csd (and also apparently did other > good stuff). But it remains for Trapped. I will investigate. > > Regards, > Mike > > On Mon, Aug 23, 2010 at 7:08 AM, Michael Gogins > |
Date | 2010-08-23 16:06 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
> I can confirm xanadu runs here with ParCS on OSX too. With ksmps=100 > and -j 2 on a dual core machine, I have > it running in 1.738 secs, versus 2.210 secs 'ordinary' (HEAD branch) > csound. > Nice to hear. I am seeing significant slowdown as I add threads. Trapped runs but slows down almost linealy Only speed-ups I get are with syntheti cexamples Getting increadsingly confused and depondant ==John ff ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 16:47 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
>> I can confirm xanadu runs here with ParCS on OSX too. With ksmps=100 >> and -j 2 on a dual core machine, I have >> it running in 1.738 secs, versus 2.210 secs 'ordinary' (HEAD branch) >> csound. >> > I have constructed the matrix of threds x ksmps -> time for Xanadu and Trapped on a 4core Linux system I also give the relative times. It is clear that small ksmps do not give enough work for threads to win. Maybe we need to activate the weights nad multi-instr per thread which is in the code, but without data I have a number of opcodes costed as init + (A x krate) + (B x srate) that needds automation (valgraind, plus hand reading of output + small C program) Still not happy with these numbers XANADU ksmps 1 10 100 300 900 Threads Time 1 29.0 19.9 17.4 17.4 17.3 2 38.3 20.1 16.9 16.8 16.1 3 41.7 20.6 16.6 15.5 15.3 4 43.5 21.1 16.2 16.4 14.4 Relative 1 1.00 1.00 1.00 1.00 1.00 2 1.32 1.01 0.97 0.97 0.93 3 1.44 1.04 0.95 0.89 0.88 4 1.50 1.06 0.93 0.94 0.83 TRAPPED ksmps 1 10 100 300 900 Threads Time 1 29.0 3.84 2.04 1.91 1.85 2 8.48 2.30 1.65 1.46 3 10.11 2.30 1.65 1.40 4 12.5 2.48 1.76 1.40 Relative 1 1.00 1.00 1.00 1.00 2 2.21 1.13 0.86 0.79 3 2.63 1.13 0.86 0.76 4 3.26 1.22 0.92 0.76 ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 16:57 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
Attachments | None None |
I'm getting good speedups, x 1.25 to x 1.5 per core, get rid of crash, add all required locks and we're done. MKG from cell phone On Aug 23, 2010 11:07 AM, <jpff@cs.bath.ac.uk> wrote: |
Date | 2010-08-23 17:17 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
Also, "Trapped in Convert" is obviously not as susceptible to speedups as Xanadu, which has multiple instances of the same instrument playing footballs. I repeat, with the latest ParCS code I am quite encouraged, not discouraged at all. We have musically useful speedups with some actual music, not test orchestras. If we can get rid of the crashes, add required locks, and put in the cleanup code so Csound can run again without exiting, this should be merged into the main branch. ...that is, if the parser can also be merged. How close is the parser to being the default? What remains to be done there? How many cores do you have, and how many threads are you testing? Where does it start to slow down? Regards, Mike On Mon, Aug 23, 2010 at 11:57 AM, Michael Gogins |
Date | 2010-08-23 17:20 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
Larger matrix XANADU ksmps 1 10 100 300 900 2100 4900 Threads Time 1 29.0 19.9 17.4 17.4 17.3 16.6 17.2 2 38.3 20.1 16.9 16.8 16.1 15.6 15.1 3 41.7 20.6 16.6 15.5 15.3 14.7 14.6 4 43.5 21.1 16.2 16.4 14.4 13.8 13.9 Relative 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 2 1.32 1.01 0.97 0.97 0.93 0.94 0.88 3 1.44 1.04 0.95 0.89 0.88 0.89 0.85 4 1.50 1.06 0.93 0.94 0.83 0.83 0.81 TRAPPED ksmps 1 10 100 300 900 2100 4900 Threads Time 1 22.0 3.84 2.04 1.91 1.85 1.84 1.75 2 66.3 8.48 2.30 1.65 1.46 1.35 1.50 3 83.4 10.11 2.30 1.65 1.40 1.34 1.51 4 106.1 12.5 2.48 1.76 1.40 1.35 1.51 Relative 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 2 3.01 2.21 1.13 0.86 0.79 0.73 0.86 3 3.79 2.63 1.13 0.86 0.76 0.73 0.86 4 4.82 3.26 1.22 0.92 0.76 0.73 0.86 ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 17:44 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
I've been testing on Windows 7, gcc 4.5, ASUS N61 (I think, ASUS anyway new last month) with Intel Core i7. You? I have Ubuntu also on my new machine, but have been procrastinating getting the Linux build of Csound working there. Regards, Mike On Mon, Aug 23, 2010 at 12:20 PM, |
Date | 2010-08-23 17:46 |
From | Andres Cabrera |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
The build on Ubuntu is really easy if you do: sudo apt-get build-dep csound This will get all the dependencies, and csound will build with scons without any problem. Cheers, Andrés On Mon, Aug 23, 2010 at 5:44 PM, Michael Gogins |
Date | 2010-08-23 17:58 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS segfaults on Windows |
Oh boy! Thank you! Regards, Mike On Mon, Aug 23, 2010 at 12:46 PM, Andres Cabrera |
Date | 2010-08-23 18:35 |
From | jpff@cs.bath.ac.uk |
Subject | [Cs-dev] ParCS performance |
> I've been testing on Windows 7, gcc 4.5, ASUS N61 (I think, ASUS > anyway new last month) with Intel Core i7. > /proc/cpuinfo says 4 CPU (dual thread) x Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz gcc version 4.5.0 8Gn main memory OpenSuse 11.3 I am using csoundSpinlock/UnLock rather than mutex at present. I changed to mutex for valgrind testing, but spinloacks are clearly the corect technology I have a few tweaks since, but only small gain. Very confused about the code now in H/cs_par_base.h ==John ff ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-23 19:26 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] ParCS performance |
>> I've been testing on Windows 7, gcc 4.5, ASUS N61 (I think, ASUS >> anyway new last month) with Intel Core i7. >> > > /proc/cpuinfo says > 4 CPU (dual thread) x Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz > gcc version 4.5.0 > 8Gn main memory > OpenSuse 11.3 > and I also have been using an AMD Phenon-II x4 on my home machine, but nt recently ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-24 02:19 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS performance |
Thanks to Andres Cabrera's excellent advice, I was quickly able to build ParCS on the Linux boot of my new toy: model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz Linux quattro 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:22:14 UTC 2010 i686 GNU/Linux mkg@quattro:~/csound/csound5$ gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.3-4ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) The following shows (a) that I am getting about the same performance from Windows and Ubuntu Linux (see earlier figures), and (b): mkg@quattro:~/csound/csound5$ ./csound -RWdf -r96000 -k960 examples/xanadu.csd Elapsed time at end of performance: real: 32.705s, CPU: 19.110s mkg@quattro:~/csound/csound5$ ./csound -RWdf -r96000 -k960 -j4 examples/xanadu.csd Elapsed time at end of performance: real: 17.023s, CPU: 31.760s That is a speedup of 1.92 - somewhat more of a speedup at this ksmps than I got with Windows. I still get a segfault with trapped.csd. I will investigate this. I have no explanation for jpff's results. I append a compiler command so the options can be seen. scons buildNewParser=1 useDouble=1 buildCsoundAC=1 buildInterfaces=1 gcc -o Engine/cs_par_base.o -c -Wno-format -DGNU_GETTEXT -g -fomit-frame-pointer -freorder-blocks -DLINUX -DPIPES -fPIC -fPIC -DHAVE_LIBSNDFILE=1016 -DHAVE_FLTK -DBETA -DUSE_DOUBLE -DHAVE_SOCKETS -DHAVE_PTHREAD_BARRIER_INIT -DHAVE_SYNC_LOCK_TEST_AND_SET -DHAVE_FCNTL_H -DHAVE_UNISTD_H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -DHAVE_VALUES_H -DHAVE_SOCKETS -DHAVE_DIRENT_H -DENABLE_NEW_PARSER -D__BUILDING_LIBCSOUND -I. -IH -I/usr/include/fltk-1.1 -I/usr/local/include -I/usr/include -I/usr/include -I/usr/X11R6/include Engine/cs_par_base.c gcc -o Engine/cs_par_base.os -c -Wno-format -DGNU_GETTEXT -g -fomit-frame-pointer -freorder-blocks -DLINUX -DPIPES -fno-strict-aliasing -fno-strict-aliasing -fPIC -DHAVE_LIBSNDFILE=1016 -DHAVE_FLTK -DBETA -DUSE_DOUBLE -DHAVE_SOCKETS -DHAVE_PTHREAD_BARRIER_INIT -DHAVE_SYNC_LOCK_TEST_AND_SET -DHAVE_FCNTL_H -DHAVE_UNISTD_H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -DHAVE_VALUES_H -DHAVE_SOCKETS -DHAVE_DIRENT_H -DENABLE_NEW_PARSER -D__BUILDING_LIBCSOUND -I. -IH -I/usr/include/fltk-1.1 -I/usr/local/include -I/usr/include -I/usr/include -I/usr/X11R6/include -Iinterfaces Engine/cs_par_base.c I am going to optimize for Core2 and see if I can it faster. Regards, Mike On Mon, Aug 23, 2010 at 2:26 PM, |
Date | 2010-08-24 02:33 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS performance |
> mkg@quattro:~/csound/csound5$ ./csound -RWdf -r96000 -k960 -j4 > examples/xanadu.csd > Elapsed time at end of performance: real: 17.023s, CPU: 31.760s With gcc4opt=core2: Elapsed time at end of performance: real: 9.090s, CPU: 14.980s On Mon, Aug 23, 2010 at 9:19 PM, Michael Gogins |
Date | 2010-08-24 02:56 |
From | Felipe Sateler |
Subject | Re: [Cs-dev] ParCS performance |
On 23/08/10 21:19, Michael Gogins wrote: > mkg@quattro:~/csound/csound5$ ./csound -RWdf -r96000 -k960 -j4 > examples/xanadu.csd > Elapsed time at end of performance: real: 17.023s, CPU: 31.760s This doesn't make sense. How is it possible have more CPU time than real time? -- Saludos, Felipe Sateler ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2010-08-24 04:13 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS performance |
By running more than one CPU during the same period of real time, of course. That's the whole point of this exercise! Real time is the wall clock time, the time it takes to get something done. This is what we care about. CPU time is the time spent on one or more CPUs. By running many CPUs at the same time, we can spend a lot of CPU time during the same interval of real time. We spend CPU time to buy real time. Regards, Mike On Mon, Aug 23, 2010 at 9:56 PM, Felipe Sateler |
Date | 2010-08-24 05:02 |
From | Felipe Sateler |
Subject | Re: [Cs-dev] ParCS performance |
Oh OK. I guess I thought CPU time was all the time spent on any CPU... which would add all the times spent on all processing units. On 23/08/10 23:13, Michael Gogins wrote: > By running more than one CPU during the same period of real time, of > course. That's the whole point of this exercise! > > Real time is the wall clock time, the time it takes to get something > done. This is what we care about. > > CPU time is the time spent on one or more CPUs. By running many CPUs > at the same time, we can spend a lot of CPU time during the same > interval of real time. We spend CPU time to buy real time. > > Regards, > Mike > > On Mon, Aug 23, 2010 at 9:56 PM, Felipe Sateler |
Date | 2010-08-24 09:52 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] ParCS performance |
Impressive! Unfortunately I have no such optimisation option on OSX. Victor On 24 Aug 2010, at 02:33, Michael Gogins wrote: >> mkg@quattro:~/csound/csound5$ ./csound -RWdf -r96000 -k960 -j4 >> examples/xanadu.csd >> Elapsed time at end of performance: real: 17.023s, CPU: 31.760s > > With gcc4opt=core2: > > Elapsed time at end of performance: real: 9.090s, CPU: 14.980s > > On Mon, Aug 23, 2010 at 9:19 PM, Michael Gogins > |
Date | 2010-08-24 11:33 |
From | Michael Gogins |
Subject | Re: [Cs-dev] ParCS performance |
Attachments | None None |
You are correct. In 1 second of real time, one core uses about one second of cpu time, two cpus use about 2 seconds, 3 cpus about 3 seconds of cpu time, etc. That's assuming the code is properly multithreaded. We are trying to spend as much cpu time as we can _in parallel_ to buy savings in real time. Of course we are trying to cut cpu time as well _in total_ by optimizing the code. MKG from cell phone On Aug 24, 2010 12:03 AM, "Felipe Sateler" <fsateler@gmail.com> wrote: |