Re: [Cs-dev] Performance Issues with Csound6
Date | 2013-08-08 12:30 |
From | john ffitch |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
> I suspect that the --sample-accurate code could be a drag in ksmps=1 > There are ways to modify the way it is done that might change performance It looks that at ksmps=1 the extra cost is in the range 5-7% or zero. A-rate assignment was much worse but I adjusted that opcode so it is bareable. I will continue to experiment as I have time ==John ffitch ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2013-08-08 12:47 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
In the tied note examples, I've also fixed ithen so that it uses only i-time code (which was also not right in cs 5), so the differences for that example with csound 5 are smaller now. Comparison between csound64 (csound5) and csound: $ csound64 tied.csd -dm128 -+skip_seconds=36 Elapsed time at end of performance: real: 6.989s, CPU: 6.306s $ ./csound tied.csd -dm128 -+skip_seconds=36 Elapsed time at end of performance: real: 7.522s, CPU: 7.162s about 1.08 times slower (with ksmps=1). Csound 6 is now faster with ksmps > 1. Victor On 8 Aug 2013, at 12:30, john ffitch wrote: >> I suspect that the --sample-accurate code could be a drag in ksmps=1 >> There are ways to modify the way it is done that might change performance > > It looks that at ksmps=1 the extra cost is in the range 5-7% or zero. > A-rate assignment was much worse but I adjusted that opcode so it is > bareable. I will continue to experiment as I have time > ==John ffitch > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Csound-devel mailing list > Csound-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/csound-devel Dr Victor Lazzarini Senior Lecturer Dept. of Music NUI Maynooth Ireland tel.: +353 1 708 3545 Victor dot Lazzarini AT nuim dot ie ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2013-08-08 13:18 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Hi Victor, I wonder if this is masking a bigger problem? Is the 1.08 times slower a result of using the same ithen change in both cs5 and cs6? I'm beginning to wonder if UDO's are the cause of the big slowdown. I'm in the middle of a big programming problem with blue and cs5/cs6 API support, but I'll look at the UDO running code after that. steven On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini |
Date | 2013-08-08 13:20 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
That could easily be tested by changing ithen to then, which is equivalent in cs 5 and cs 6. The ithen bug needed to be fixed anyway. Victor On 8 Aug 2013, at 13:18, Steven Yi wrote: > Hi Victor, > > I wonder if this is masking a bigger problem? Is the 1.08 times > slower a result of using the same ithen change in both cs5 and cs6? > > I'm beginning to wonder if UDO's are the cause of the big slowdown. > I'm in the middle of a big programming problem with blue and cs5/cs6 > API support, but I'll look at the UDO running code after that. > > steven > > On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini > |
Date | 2013-08-08 13:25 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
The result with then is now: $ ./csound tied.csd -dm128 -+skip_seconds=36 Elapsed time at end of performance: real: 8.319s, CPU: 7.413s $csound64 tied.csd -dm128 -+skip_seconds=36 Elapsed time at end of performance: real: 7.169s, CPU: 6.304s 1.16 times slower. Victor On 8 Aug 2013, at 13:20, Victor Lazzarini wrote: > That could easily be tested by changing ithen to then, which is equivalent in cs 5 and cs 6. > The ithen bug needed to be fixed anyway. > > Victor > On 8 Aug 2013, at 13:18, Steven Yi wrote: > >> Hi Victor, >> >> I wonder if this is masking a bigger problem? Is the 1.08 times >> slower a result of using the same ithen change in both cs5 and cs6? >> >> I'm beginning to wonder if UDO's are the cause of the big slowdown. >> I'm in the middle of a big programming problem with blue and cs5/cs6 >> API support, but I'll look at the UDO running code after that. >> >> steven >> >> On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini >> |
Date | 2013-08-08 13:35 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Another data item; UDO is slower due to local ksmps; in many cases it is the only difference (eg kgoto) ==John > The result with then is now: > > $ ./csound tied.csd -dm128 -+skip_seconds=36 > Elapsed time at end of performance: real: 8.319s, CPU: 7.413s > > $csound64 tied.csd -dm128 -+skip_seconds=36 > Elapsed time at end of performance: real: 7.169s, CPU: 6.304s > > 1.16 times slower. > > Victor > On 8 Aug 2013, at 13:20, Victor Lazzarini wrote: > >> That could easily be tested by changing ithen to then, which is >> equivalent in cs 5 and cs 6. >> The ithen bug needed to be fixed anyway. >> >> Victor >> On 8 Aug 2013, at 13:18, Steven Yi wrote: >> >>> Hi Victor, >>> >>> I wonder if this is masking a bigger problem? Is the 1.08 times >>> slower a result of using the same ithen change in both cs5 and cs6? >>> >>> I'm beginning to wonder if UDO's are the cause of the big slowdown. >>> I'm in the middle of a big programming problem with blue and cs5/cs6 >>> API support, but I'll look at the UDO running code after that. >>> >>> steven >>> >>> On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini >>> |
Date | 2013-08-08 13:40 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I took off the UDO call in the tied notes example and it made little difference in terms of performance improvement. I think the slowdown is elsewhere. Victor On 8 Aug 2013, at 13:35, jpff@cs.bath.ac.uk wrote: > Another data item; UDO is slower due to local ksmps; in many cases it is > the only difference (eg kgoto) > > ==John > > >> The result with then is now: >> >> $ ./csound tied.csd -dm128 -+skip_seconds=36 >> Elapsed time at end of performance: real: 8.319s, CPU: 7.413s >> >> $csound64 tied.csd -dm128 -+skip_seconds=36 >> Elapsed time at end of performance: real: 7.169s, CPU: 6.304s >> >> 1.16 times slower. >> >> Victor >> On 8 Aug 2013, at 13:20, Victor Lazzarini wrote: >> >>> That could easily be tested by changing ithen to then, which is >>> equivalent in cs 5 and cs 6. >>> The ithen bug needed to be fixed anyway. >>> >>> Victor >>> On 8 Aug 2013, at 13:18, Steven Yi wrote: >>> >>>> Hi Victor, >>>> >>>> I wonder if this is masking a bigger problem? Is the 1.08 times >>>> slower a result of using the same ithen change in both cs5 and cs6? >>>> >>>> I'm beginning to wonder if UDO's are the cause of the big slowdown. >>>> I'm in the middle of a big programming problem with blue and cs5/cs6 >>>> API support, but I'll look at the UDO running code after that. >>>> >>>> steven >>>> >>>> On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini >>>> |
Date | 2013-08-08 14:13 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
the macro CS_PDS is more expensive in cs6; explains kngoto 378,575,856->408,782,616; similar for kgoto. Still eating into performance -- btw some functions are absolutely faster like sensevents 109,703,302->102,048,322 > I took off the UDO call in the tied notes example and it made little > difference in terms > of performance improvement. I think the slowdown is elsewhere. > > Victor > On 8 Aug 2013, at 13:35, jpff@cs.bath.ac.uk wrote: > >> Another data item; UDO is slower due to local ksmps; in many cases it is >> the only difference (eg kgoto) >> >> ==John >> >> >>> The result with then is now: >>> >>> $ ./csound tied.csd -dm128 -+skip_seconds=36 >>> Elapsed time at end of performance: real: 8.319s, CPU: 7.413s >>> >>> $csound64 tied.csd -dm128 -+skip_seconds=36 >>> Elapsed time at end of performance: real: 7.169s, CPU: 6.304s >>> >>> 1.16 times slower. >>> >>> Victor >>> On 8 Aug 2013, at 13:20, Victor Lazzarini wrote: >>> >>>> That could easily be tested by changing ithen to then, which is >>>> equivalent in cs 5 and cs 6. >>>> The ithen bug needed to be fixed anyway. >>>> >>>> Victor >>>> On 8 Aug 2013, at 13:18, Steven Yi wrote: >>>> >>>>> Hi Victor, >>>>> >>>>> I wonder if this is masking a bigger problem? Is the 1.08 times >>>>> slower a result of using the same ithen change in both cs5 and cs6? >>>>> >>>>> I'm beginning to wonder if UDO's are the cause of the big slowdown. >>>>> I'm in the middle of a big programming problem with blue and cs5/cs6 >>>>> API support, but I'll look at the UDO running code after that. >>>>> >>>>> steven >>>>> >>>>> On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini >>>>> |
Date | 2013-08-08 15:24 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
yes, that would make sense. On 8 Aug 2013, at 14:13, jpff@cs.bath.ac.uk wrote: > the macro CS_PDS is more expensive in cs6; explains kngoto > 378,575,856->408,782,616; similar for kgoto. > > Still eating into performance -- > > btw some functions are absolutely faster like sensevents > 109,703,302->102,048,322 > > > > >> I took off the UDO call in the tied notes example and it made little >> difference in terms >> of performance improvement. I think the slowdown is elsewhere. >> >> Victor >> On 8 Aug 2013, at 13:35, jpff@cs.bath.ac.uk wrote: >> >>> Another data item; UDO is slower due to local ksmps; in many cases it is >>> the only difference (eg kgoto) >>> >>> ==John >>> >>> >>>> The result with then is now: >>>> >>>> $ ./csound tied.csd -dm128 -+skip_seconds=36 >>>> Elapsed time at end of performance: real: 8.319s, CPU: 7.413s >>>> >>>> $csound64 tied.csd -dm128 -+skip_seconds=36 >>>> Elapsed time at end of performance: real: 7.169s, CPU: 6.304s >>>> >>>> 1.16 times slower. >>>> >>>> Victor >>>> On 8 Aug 2013, at 13:20, Victor Lazzarini wrote: >>>> >>>>> That could easily be tested by changing ithen to then, which is >>>>> equivalent in cs 5 and cs 6. >>>>> The ithen bug needed to be fixed anyway. >>>>> >>>>> Victor >>>>> On 8 Aug 2013, at 13:18, Steven Yi wrote: >>>>> >>>>>> Hi Victor, >>>>>> >>>>>> I wonder if this is masking a bigger problem? Is the 1.08 times >>>>>> slower a result of using the same ithen change in both cs5 and cs6? >>>>>> >>>>>> I'm beginning to wonder if UDO's are the cause of the big slowdown. >>>>>> I'm in the middle of a big programming problem with blue and cs5/cs6 >>>>>> API support, but I'll look at the UDO running code after that. >>>>>> >>>>>> steven >>>>>> >>>>>> On Thu, Aug 8, 2013 at 1:47 PM, Victor Lazzarini >>>>>> |
Date | 2013-08-09 09:37 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Just an update, I've got Csound6 running via API now with Blue. I tried a few Blue examples and got bad audio dropouts and crackling that doesn't happen with CS5 API. The examples do tend to have similar instruments to what I had used before, so they may be exhibiting similar performance due to that. I've got a little more to test and do with Blue today, but will be heading back to looking at Csound performance this afternoon. On Thu, Aug 8, 2013 at 4:24 PM, Victor Lazzarini |
Date | 2013-08-09 12:22 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
Is there possibly a relationship between the Csound 6 performance problems you are looking at, and the CsoundQt realtime performance problems I am experiencing? To recap, in my WIndows 8 environment, ordinary Csound 6 rendering to soundfile is definitely significantly faster than Csound 5, and I can prove it. When I run CsoundQt in real time, I experience very poor performance, with many dropouts. The real-time examples that I have for the command line also work just fine for Csound 6.
Unfortunately, I don't have an example of compute-intensive real-time performance that is designed both for command-line use and for CsoundQt. I suppose I can try to come up with one, so I can detect whether the problem is Csound itself or CsoundQt. That shouldn't take long, or do you have such an example, e.g. with FLTK controls, that is compute intensive?
Even more unfortunately, CsoundQt has another problem, not running audio a second time, and I don't have the time to investigate this myself. I'm hoping someone else can pick this up. Regards, Mike =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 4:37 AM, Steven Yi <stevenyi@gmail.com> wrote: Just an update, I've got Csound6 running via API now with Blue. I |
Date | 2013-08-09 12:28 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I don't think your problems with CsoundQT are specifically related to the issues discussed here. In OSX 10.6, CsoundQT is performing OK. Victor On 9 Aug 2013, at 12:22, Michael Gogins wrote: > Is there possibly a relationship between the Csound 6 performance problems you are looking at, and the CsoundQt realtime performance problems I am experiencing? > > To recap, in my WIndows 8 environment, ordinary Csound 6 rendering to soundfile is definitely significantly faster than Csound 5, and I can prove it. When I run CsoundQt in real time, I experience very poor performance, with many dropouts. The real-time examples that I have for the command line also work just fine for Csound 6. > > Unfortunately, I don't have an example of compute-intensive real-time performance that is designed both for command-line use and for CsoundQt. I suppose I can try to come up with one, so I can detect whether the problem is Csound itself or CsoundQt. That shouldn't take long, or do you have such an example, e.g. with FLTK controls, that is compute intensive? > > Even more unfortunately, CsoundQt has another problem, not running audio a second time, and I don't have the time to investigate this myself. I'm hoping someone else can pick this up. > > Regards, > Mike > > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 4:37 AM, Steven Yi |
Date | 2013-08-09 12:39 |
From | jpff |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I do have instruction counts for the original example. we have reducted 23b to 20b as against 17b in cs5 Given the additional features I do not think this is too bad ==John ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2013-08-09 14:28 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | clojure-piano-phase.csd None None |
I've just tried running another example. Running to disk, with Csound6 I get: real 2m37.509s user 2m35.736s sys 0m0.452s with Csound5 I get: real 1m7.775s user 1m4.561s sys 0m0.392s This is with ksmps=1, but still, csound6 for this file take more than double the time. I ran using: time csound clojure-piano-phase.csd -o t.wav time csound64 clojure-piano-phase.csd -o t.wav I'm going to run this in XCode's instruments now. I'm also going to recheck if there's a missing optimization flag compared to cs5. On Fri, Aug 9, 2013 at 1:39 PM, jpff |
Date | 2013-08-09 15:03 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Just to note, Instruments shows that useropcd2 is much larger in instructions than in CS5. It looks like it's due to CS_PDS as John mentioned it being more expensive. I'm trying out some things with using a local OPDS assigned from CS_PDS, but the difference in instructions is below. CS6: +0x00 pushq %rbp +0x01 movq %rsp, %rbp +0x04 pushq %r15 +0x06 pushq %r14 +0x08 pushq %r12 +0x0a pushq %rbx +0x0b movq %rsi, %r12 +0x0e movq %rdi, %r14 +0x11 movq 40(%r12), %rcx +0x16 movq 48(%r12), %rax +0x1b movq 192(%rcx), %r15 +0x22 movq 12224(%r14), %rcx +0x29 movq %rcx, 280(%rax) +0x30 movq 48(%r12), %rax +0x35 movq 12232(%r14), %rcx +0x3c movq %rcx, 288(%rax) +0x43 movq 40(%r12), %rax +0x48 movq 48(%r12), %rcx +0x4d movq 8(%rcx), %rcx +0x51 movq %rcx, 192(%rax) +0x58 testq %rcx, %rcx +0x5b je useropcd2+0x576 +0x61 movq 48(%r12), %rax +0x66 movq 56(%r12), %rcx +0x6b movb 116(%rcx), %cl +0x6e movb %cl, 116(%rax) +0x71 movq 40(%r12), %rcx +0x76 movq 64(%r12), %rbx +0x7b movq 24(%rbx), %rax +0x7f addq $24, %rbx +0x83 cmpl $1, 156(%rcx) +0x8a je useropcd2+0x114 +0x90 testq %rax, %rax +0x93 je useropcd2+0xd1 +0x95 movl 2680(%r14), %ecx +0x9c nopl (%rax) +0xa0 movq 8(%rbx), %rdx +0xa4 movl %ecx, %esi +0xa6 nopw %cs:(%rax,%rax) +0xb0 movsd (%rax), %xmm0 +0xb4 movsd %xmm0, (%rdx) +0xb8 addq $8, %rdx +0xbc addq $8, %rax +0xc0 decl %esi +0xc2 jne useropcd2+0xb0 +0xc4 movq 16(%rbx), %rax +0xc8 addq $16, %rbx +0xcc testq %rax, %rax +0xcf jne useropcd2+0xa0 +0xd1 movq 8(%rbx), %rax +0xd5 testq %rax, %rax +0xd8 je useropcd2+0x151 +0xda addq $24, %rbx +0xde nop +0xe0 movq -8(%rbx), %rcx +0xe4 movsd (%rax), %xmm0 +0xe8 movsd %xmm0, (%rcx) +0xec movq (%rbx), %rax +0xef addq $16, %rbx +0xf3 testq %rax, %rax +0xf6 jne useropcd2+0xe0 +0xf8 addq $-16, %rbx +0xfc jmp useropcd2+0x155 +0xfe nop +0x100 movq 8(%rbx), %rcx +0x104 movsd (%rax), %xmm0 +0x108 movsd %xmm0, (%rcx) +0x10c movq 16(%rbx), %rax +0x110 addq $16, %rbx +0x114 testq %rax, %rax +0x117 jne useropcd2+0x100 +0x119 movq 8(%rbx), %rax +0x11d testq %rax, %rax +0x120 je useropcd2+0x302 +0x126 addq $24, %rbx +0x12a nopw (%rax,%rax) +0x130 movq -8(%rbx), %rcx +0x134 movsd (%rax), %xmm0 +0x138 movsd %xmm0, (%rcx) +0x13c movq (%rbx), %rax +0x13f addq $16, %rbx +0x143 testq %rax, %rax +0x146 jne useropcd2+0x130 +0x148 addq $-16, %rbx +0x14c jmpq useropcd2+0x306 +0x151 addq $8, %rbx +0x155 movq 8(%rbx), %rax +0x159 testq %rax, %rax +0x15c je useropcd2+0x1c4 +0x15e addq $24, %rbx +0x162 nopw %cs:(%rax,%rax) +0x170 movq -8(%rbx), %rcx +0x174 movq 24(%rax), %r8 +0x178 movq 32(%rax), %r9 +0x17c movq 40(%rax), %rdi +0x180 movq 48(%rax), %rdx +0x184 movq 56(%rax), %rsi +0x188 movq %rsi, 56(%rcx) +0x18c movq %rdx, 48(%rcx) +0x190 movq %rdi, 40(%rcx) +0x194 movq %r9, 32(%rcx) +0x198 movq %r8, 24(%rcx) +0x19c movq 16(%rax), %rdx +0x1a0 movq %rdx, 16(%rcx) +0x1a4 movq (%rax), %rdx +0x1a7 movq 8(%rax), %rax +0x1ab movq %rax, 8(%rcx) +0x1af movq %rdx, (%rcx) +0x1b2 movq (%rbx), %rax +0x1b5 addq $16, %rbx +0x1b9 testq %rax, %rax +0x1bc jne useropcd2+0x170 +0x1be addq $-16, %rbx +0x1c2 jmp useropcd2+0x1c8 +0x1c4 addq $8, %rbx +0x1c8 movq 8(%rbx), %rax +0x1cc testq %rax, %rax +0x1cf je useropcd2+0x21c +0x1d1 addq $24, %rbx +0x1d5 nopw %cs:(%rax,%rax) +0x1e0 movq -8(%rbx), %rcx +0x1e4 movq 24(%rax), %rdx +0x1e8 movq 32(%rax), %rsi +0x1ec movq %rsi, 32(%rcx) +0x1f0 movq %rdx, 24(%rcx) +0x1f4 movq 16(%rax), %rdx +0x1f8 movq %rdx, 16(%rcx) +0x1fc movq (%rax), %rdx +0x1ff movq 8(%rax), %rax +0x203 movq %rax, 8(%rcx) +0x207 movq %rdx, (%rcx) +0x20a movq (%rbx), %rax +0x20d addq $16, %rbx +0x211 testq %rax, %rax +0x214 jne useropcd2+0x1e0 +0x216 addq $-16, %rbx +0x21a jmp useropcd2+0x220 +0x21c addq $8, %rbx +0x220 movq 40(%r12), %rax +0x225 movq 192(%rax), %rax +0x22c movq 40(%rax), %rax +0x230 movq $0, 192(%rax) +0x23b nopl (%rax,%rax) +0x240 movq 40(%r12), %rax +0x245 movq 192(%rax), %rsi +0x24c movq %r14, %rdi +0x24f callq *24(%rsi) +0x252 movq 40(%r12), %rax +0x257 movq 192(%rax), %rcx +0x25e movq 40(%rcx), %rdx +0x262 movq 192(%rdx), %rdx +0x269 testq %rdx, %rdx +0x26c je useropcd2+0x29c +0x26e movq %rdx, 192(%rax) +0x275 movq 40(%r12), %rax +0x27a movq 192(%rax), %rax +0x281 movq 40(%rax), %rax +0x285 movq $0, 192(%rax) +0x290 movq 40(%r12), %rax +0x295 movq 192(%rax), %rcx +0x29c movq 8(%rcx), %rcx +0x2a0 movq %rcx, 192(%rax) +0x2a7 testq %rcx, %rcx +0x2aa jne useropcd2+0x240 +0x2ac movq 8(%rbx), %rax +0x2b0 testq %rax, %rax +0x2b3 je useropcd2+0x2f9 +0x2b5 movl 2680(%r14), %ecx +0x2bc nopl (%rax) +0x2c0 movq %rbx, %rdx +0x2c3 movq 16(%rdx), %rsi +0x2c7 leaq 16(%rdx), %rbx +0x2cb movl %ecx, %edi +0x2cd nopl (%rax) +0x2d0 movsd (%rax), %xmm0 +0x2d4 movsd %xmm0, (%rsi) +0x2d8 addq $8, %rsi +0x2dc addq $8, %rax +0x2e0 decl %edi +0x2e2 jne useropcd2+0x2d0 +0x2e4 movq 24(%rdx), %rax +0x2e8 testq %rax, %rax +0x2eb jne useropcd2+0x2c0 +0x2ed addq $24, %rdx +0x2f1 movq %rdx, %rbx +0x2f4 jmpq useropcd2+0x492 +0x2f9 addq $8, %rbx +0x2fd jmpq useropcd2+0x492 +0x302 addq $8, %rbx +0x306 movq 8(%rbx), %rax +0x30a testq %rax, %rax +0x30d je useropcd2+0x374 +0x30f addq $24, %rbx +0x313 nopw %cs:(%rax,%rax) +0x320 movq -8(%rbx), %rcx +0x324 movq 24(%rax), %r8 +0x328 movq 32(%rax), %r9 +0x32c movq 40(%rax), %rdi +0x330 movq 48(%rax), %rdx +0x334 movq 56(%rax), %rsi +0x338 movq %rsi, 56(%rcx) +0x33c movq %rdx, 48(%rcx) +0x340 movq %rdi, 40(%rcx) +0x344 movq %r9, 32(%rcx) +0x348 movq %r8, 24(%rcx) +0x34c movq 16(%rax), %rdx +0x350 movq %rdx, 16(%rcx) +0x354 movq (%rax), %rdx +0x357 movq 8(%rax), %rax +0x35b movq %rax, 8(%rcx) +0x35f movq %rdx, (%rcx) +0x362 movq (%rbx), %rax +0x365 addq $16, %rbx +0x369 testq %rax, %rax +0x36c jne useropcd2+0x320 +0x36e addq $-16, %rbx +0x372 jmp useropcd2+0x378 +0x374 addq $8, %rbx +0x378 movq 8(%rbx), %rax +0x37c testq %rax, %rax +0x37f je useropcd2+0x3cc +0x381 addq $24, %rbx +0x385 nopw %cs:(%rax,%rax) +0x390 movq -8(%rbx), %rcx +0x394 movq 24(%rax), %rdx +0x398 movq 32(%rax), %rsi +0x39c movq %rsi, 32(%rcx) +0x3a0 movq %rdx, 24(%rcx) +0x3a4 movq 16(%rax), %rdx +0x3a8 movq %rdx, 16(%rcx) +0x3ac movq (%rax), %rdx +0x3af movq 8(%rax), %rax +0x3b3 movq %rax, 8(%rcx) +0x3b7 movq %rdx, (%rcx) +0x3ba movq (%rbx), %rax +0x3bd addq $16, %rbx +0x3c1 testq %rax, %rax +0x3c4 jne useropcd2+0x390 +0x3c6 addq $-16, %rbx +0x3ca jmp useropcd2+0x3d0 +0x3cc addq $8, %rbx +0x3d0 movq 40(%r12), %rax +0x3d5 movq 192(%rax), %rax +0x3dc movq 40(%rax), %rax +0x3e0 movq $0, 192(%rax) +0x3eb nopl (%rax,%rax) +0x3f0 movq 40(%r12), %rax +0x3f5 movq 192(%rax), %rsi +0x3fc movq %r14, %rdi +0x3ff callq *24(%rsi) +0x402 movq 40(%r12), %rax +0x407 movq 192(%rax), %rcx +0x40e movq 40(%rcx), %rdx +0x412 movq 192(%rdx), %rdx +0x419 testq %rdx, %rdx +0x41c je useropcd2+0x44c +0x41e movq %rdx, 192(%rax) +0x425 movq 40(%r12), %rax +0x42a movq 192(%rax), %rax +0x431 movq 40(%rax), %rax +0x435 movq $0, 192(%rax) +0x440 movq 40(%r12), %rax +0x445 movq 192(%rax), %rcx +0x44c movq 8(%rcx), %rcx +0x450 movq %rcx, 192(%rax) +0x457 testq %rcx, %rcx +0x45a jne useropcd2+0x3f0 +0x45c movq 8(%rbx), %rax +0x460 testq %rax, %rax +0x463 je useropcd2+0x48e +0x465 addq $24, %rbx +0x469 nopl (%rax) +0x470 movq -8(%rbx), %rcx +0x474 movsd (%rax), %xmm0 +0x478 movsd %xmm0, (%rcx) +0x47c movq (%rbx), %rax +0x47f addq $16, %rbx +0x483 testq %rax, %rax +0x486 jne useropcd2+0x470 +0x488 addq $-16, %rbx +0x48c jmp useropcd2+0x492 +0x48e addq $8, %rbx +0x492 movq 8(%rbx), %rax +0x496 testq %rax, %rax +0x499 je useropcd2+0x4be +0x49b addq $24, %rbx +0x49f nop +0x4a0 movq -8(%rbx), %rcx +0x4a4 movsd (%rax), %xmm0 +0x4a8 movsd %xmm0, (%rcx) +0x4ac movq (%rbx), %rax +0x4af addq $16, %rbx +0x4b3 testq %rax, %rax +0x4b6 jne useropcd2+0x4a0 +0x4b8 addq $-16, %rbx +0x4bc jmp useropcd2+0x4c2 +0x4be addq $8, %rbx +0x4c2 movq 8(%rbx), %rax +0x4c6 testq %rax, %rax +0x4c9 je useropcd2+0x524 +0x4cb addq $24, %rbx +0x4cf nop +0x4d0 movq -8(%rbx), %rcx +0x4d4 movq 24(%rax), %r8 +0x4d8 movq 32(%rax), %r9 +0x4dc movq 40(%rax), %rdi +0x4e0 movq 48(%rax), %rdx +0x4e4 movq 56(%rax), %rsi +0x4e8 movq %rsi, 56(%rcx) +0x4ec movq %rdx, 48(%rcx) +0x4f0 movq %rdi, 40(%rcx) +0x4f4 movq %r9, 32(%rcx) +0x4f8 movq %r8, 24(%rcx) +0x4fc movq 16(%rax), %rdx +0x500 movq %rdx, 16(%rcx) +0x504 movq (%rax), %rdx +0x507 movq 8(%rax), %rax +0x50b movq %rax, 8(%rcx) +0x50f movq %rdx, (%rcx) +0x512 movq (%rbx), %rax +0x515 addq $16, %rbx +0x519 testq %rax, %rax +0x51c jne useropcd2+0x4d0 +0x51e addq $-16, %rbx +0x522 jmp useropcd2+0x528 +0x524 addq $8, %rbx +0x528 movq 8(%rbx), %rax +0x52c testq %rax, %rax +0x52f je useropcd2+0x576 +0x531 addq $24, %rbx +0x535 nopw %cs:(%rax,%rax) +0x540 movq -8(%rbx), %rcx +0x544 movq 24(%rax), %rdx +0x548 movq 32(%rax), %rsi +0x54c movq %rsi, 32(%rcx) +0x550 movq %rdx, 24(%rcx) +0x554 movq 16(%rax), %rdx +0x558 movq %rdx, 16(%rcx) +0x55c movq (%rax), %rdx +0x55f movq 8(%rax), %rax +0x563 movq %rax, 8(%rcx) +0x567 movq %rdx, (%rcx) +0x56a movq (%rbx), %rax +0x56d addq $16, %rbx +0x571 testq %rax, %rax +0x574 jne useropcd2+0x540 +0x576 movq 40(%r12), %rax +0x57b movq %r15, 192(%rax) +0x582 cmpq $0, 48(%r12) +0x588 jne useropcd2+0x5ce +0x58a movq 40(%r12), %rax +0x58f movq 192(%rax), %rcx +0x596 movq 8(%rcx), %rcx +0x59a testq %rcx, %rcx +0x59d je useropcd2+0x5ce +0x59f addq $192, %rax +0x5a5 nopw %cs:(%rax,%rax) +0x5b0 movq %rcx, (%rax) +0x5b3 movq 40(%r12), %rax +0x5b8 movq 192(%rax), %rcx +0x5bf addq $192, %rax +0x5c5 movq 8(%rcx), %rcx +0x5c9 testq %rcx, %rcx +0x5cc jne useropcd2+0x5b0 +0x5ce xorl %eax, %eax +0x5d0 popq %rbx +0x5d1 popq %r12 +0x5d3 popq %r14 +0x5d5 popq %r15 +0x5d7 popq %rbp +0x5d8 ret +0x5d9 nopl (%rax) CS5: +0x00 pushq %rbp +0x01 movq %rsp, %rbp +0x04 pushq %r15 +0x06 pushq %r14 +0x08 pushq %r12 +0x0a pushq %rbx +0x0b movq %rsi, %r14 +0x0e movq %rdi, %r12 +0x11 movq 2576(%r12), %r15 +0x19 movq 48(%r14), %rax +0x1d movq 8(%rax), %rax +0x21 movq %rax, 2576(%r12) +0x29 testq %rax, %rax +0x2c je useropcd2+0x42e +0x32 movq 48(%r14), %rax +0x36 movq 56(%r14), %rcx +0x3a movb 102(%rcx), %cl +0x3d movb %cl, 102(%rax) +0x40 movq 64(%r14), %rbx +0x44 movq 24(%rbx), %rax +0x48 addq $24, %rbx +0x4c movl 2584(%r12), %ecx +0x54 cmpl $1, %ecx +0x57 je useropcd2+0x74 +0x59 jmp useropcd2+0xb8 +0x5b nopl (%rax,%rax) +0x60 movq 8(%rbx), %rcx +0x64 movsd (%rax), %xmm0 +0x68 movsd %xmm0, (%rcx) +0x6c movq 16(%rbx), %rax +0x70 addq $16, %rbx +0x74 testq %rax, %rax +0x77 jne useropcd2+0x60 +0x79 movq 8(%rbx), %rax +0x7d testq %rax, %rax +0x80 je useropcd2+0x11e +0x86 addq $24, %rbx +0x8a nopw (%rax,%rax) +0x90 movq -8(%rbx), %rcx +0x94 movsd (%rax), %xmm0 +0x98 movsd %xmm0, (%rcx) +0x9c movq (%rbx), %rax +0x9f addq $16, %rbx +0xa3 testq %rax, %rax +0xa6 jne useropcd2+0x90 +0xa8 addq $-16, %rbx +0xac jmp useropcd2+0x122 +0xae nop +0xb0 movq 16(%rbx), %rax +0xb4 addq $16, %rbx +0xb8 testq %rax, %rax +0xbb je useropcd2+0xe6 +0xbd movq 8(%rbx), %rdx +0xc1 movl %ecx, %esi +0xc3 nopw %cs:(%rax,%rax) +0xd0 movsd (%rax), %xmm0 +0xd4 movsd %xmm0, (%rdx) +0xd8 addq $8, %rdx +0xdc addq $8, %rax +0xe0 decl %esi +0xe2 jne useropcd2+0xd0 +0xe4 jmp useropcd2+0xb0 +0xe6 movq 8(%rbx), %rax +0xea testq %rax, %rax +0xed je useropcd2+0x194 +0xf3 addq $24, %rbx +0xf7 nopw (%rax,%rax) +0x100 movq -8(%rbx), %rcx +0x104 movsd (%rax), %xmm0 +0x108 movsd %xmm0, (%rcx) +0x10c movq (%rbx), %rax +0x10f addq $16, %rbx +0x113 testq %rax, %rax +0x116 jne useropcd2+0x100 +0x118 addq $-16, %rbx +0x11c jmp useropcd2+0x198 +0x11e addq $8, %rbx +0x122 movq 8(%rbx), %rax +0x126 testq %rax, %rax +0x129 je useropcd2+0x204 +0x12f addq $24, %rbx +0x133 nopw %cs:(%rax,%rax) +0x140 movq -8(%rbx), %rcx +0x144 movq 24(%rax), %r8 +0x148 movq 32(%rax), %r9 +0x14c movq 40(%rax), %rdi +0x150 movq 48(%rax), %rdx +0x154 movq 56(%rax), %rsi +0x158 movq %rsi, 56(%rcx) +0x15c movq %rdx, 48(%rcx) +0x160 movq %rdi, 40(%rcx) +0x164 movq %r9, 32(%rcx) +0x168 movq %r8, 24(%rcx) +0x16c movq 16(%rax), %rdx +0x170 movq %rdx, 16(%rcx) +0x174 movq (%rax), %rdx +0x177 movq 8(%rax), %rax +0x17b movq %rax, 8(%rcx) +0x17f movq %rdx, (%rcx) +0x182 movq (%rbx), %rax +0x185 addq $16, %rbx +0x189 testq %rax, %rax +0x18c jne useropcd2+0x140 +0x18e addq $-16, %rbx +0x192 jmp useropcd2+0x208 +0x194 addq $8, %rbx +0x198 movq 8(%rbx), %rax +0x19c testq %rax, %rax +0x19f je useropcd2+0x244 +0x1a5 addq $24, %rbx +0x1a9 nopl (%rax) +0x1b0 movq -8(%rbx), %rcx +0x1b4 movq 24(%rax), %r8 +0x1b8 movq 32(%rax), %r9 +0x1bc movq 40(%rax), %rdi +0x1c0 movq 48(%rax), %rdx +0x1c4 movq 56(%rax), %rsi +0x1c8 movq %rsi, 56(%rcx) +0x1cc movq %rdx, 48(%rcx) +0x1d0 movq %rdi, 40(%rcx) +0x1d4 movq %r9, 32(%rcx) +0x1d8 movq %r8, 24(%rcx) +0x1dc movq 16(%rax), %rdx +0x1e0 movq %rdx, 16(%rcx) +0x1e4 movq (%rax), %rdx +0x1e7 movq 8(%rax), %rax +0x1eb movq %rax, 8(%rcx) +0x1ef movq %rdx, (%rcx) +0x1f2 movq (%rbx), %rax +0x1f5 addq $16, %rbx +0x1f9 testq %rax, %rax +0x1fc jne useropcd2+0x1b0 +0x1fe addq $-16, %rbx +0x202 jmp useropcd2+0x248 +0x204 addq $8, %rbx +0x208 movq 8(%rbx), %rax +0x20c testq %rax, %rax +0x20f je useropcd2+0x284 +0x211 addq $24, %rbx +0x215 nopw %cs:(%rax,%rax) +0x220 movq -8(%rbx), %rcx +0x224 movq (%rax), %rdx +0x227 movq 8(%rax), %rax +0x22b movq %rax, 8(%rcx) +0x22f movq %rdx, (%rcx) +0x232 movq (%rbx), %rax +0x235 addq $16, %rbx +0x239 testq %rax, %rax +0x23c jne useropcd2+0x220 +0x23e addq $-16, %rbx +0x242 jmp useropcd2+0x288 +0x244 addq $8, %rbx +0x248 movq 8(%rbx), %rax +0x24c testq %rax, %rax +0x24f je useropcd2+0x2e1 +0x255 addq $24, %rbx +0x259 nopl (%rax) +0x260 movq -8(%rbx), %rcx +0x264 movq (%rax), %rdx +0x267 movq 8(%rax), %rax +0x26b movq %rax, 8(%rcx) +0x26f movq %rdx, (%rcx) +0x272 movq (%rbx), %rax +0x275 addq $16, %rbx +0x279 testq %rax, %rax +0x27c jne useropcd2+0x260 +0x27e addq $-16, %rbx +0x282 jmp useropcd2+0x2e5 +0x284 addq $8, %rbx +0x288 movq 2576(%r12), %rsi +0x290 movq %r12, %rdi +0x293 callq *24(%rsi) +0x296 movq 2576(%r12), %rax +0x29e movq 8(%rax), %rsi +0x2a2 movq %rsi, 2576(%r12) +0x2aa testq %rsi, %rsi +0x2ad jne useropcd2+0x290 +0x2af movq 8(%rbx), %rax +0x2b3 testq %rax, %rax +0x2b6 je useropcd2+0x356 +0x2bc addq $24, %rbx +0x2c0 movq -8(%rbx), %rcx +0x2c4 movsd (%rax), %xmm0 +0x2c8 movsd %xmm0, (%rcx) +0x2cc movq (%rbx), %rax +0x2cf addq $16, %rbx +0x2d3 testq %rax, %rax +0x2d6 jne useropcd2+0x2c0 +0x2d8 addq $-16, %rbx +0x2dc jmpq useropcd2+0x360 +0x2e1 addq $8, %rbx +0x2e5 movq 2576(%r12), %rsi +0x2ed nopl (%rax) +0x2f0 movq %r12, %rdi +0x2f3 callq *24(%rsi) +0x2f6 movq 2576(%r12), %rax +0x2fe movq 8(%rax), %rsi +0x302 movq %rsi, 2576(%r12) +0x30a testq %rsi, %rsi +0x30d jne useropcd2+0x2f0 +0x30f movq 8(%rbx), %rax +0x313 testq %rax, %rax +0x316 je useropcd2+0x35c +0x318 movl 2584(%r12), %ecx +0x320 movq %rbx, %rdx +0x323 movq 16(%rdx), %rsi +0x327 leaq 16(%rdx), %rbx +0x32b movl %ecx, %edi +0x32d nopl (%rax) +0x330 movsd (%rax), %xmm0 +0x334 movsd %xmm0, (%rsi) +0x338 addq $8, %rsi +0x33c addq $8, %rax +0x340 decl %edi +0x342 jne useropcd2+0x330 +0x344 movq 24(%rdx), %rax +0x348 testq %rax, %rax +0x34b jne useropcd2+0x320 +0x34d addq $24, %rdx +0x351 movq %rdx, %rbx +0x354 jmp useropcd2+0x360 +0x356 addq $8, %rbx +0x35a jmp useropcd2+0x360 +0x35c addq $8, %rbx +0x360 movq 8(%rbx), %rax +0x364 testq %rax, %rax +0x367 je useropcd2+0x38e +0x369 addq $24, %rbx +0x36d nopl (%rax) +0x370 movq -8(%rbx), %rcx +0x374 movsd (%rax), %xmm0 +0x378 movsd %xmm0, (%rcx) +0x37c movq (%rbx), %rax +0x37f addq $16, %rbx +0x383 testq %rax, %rax +0x386 jne useropcd2+0x370 +0x388 addq $-16, %rbx +0x38c jmp useropcd2+0x392 +0x38e addq $8, %rbx +0x392 movq 8(%rbx), %rax +0x396 testq %rax, %rax +0x399 je useropcd2+0x3f4 +0x39b addq $24, %rbx +0x39f nop +0x3a0 movq -8(%rbx), %rcx +0x3a4 movq 24(%rax), %r8 +0x3a8 movq 32(%rax), %r9 +0x3ac movq 40(%rax), %rdi +0x3b0 movq 48(%rax), %rdx +0x3b4 movq 56(%rax), %rsi +0x3b8 movq %rsi, 56(%rcx) +0x3bc movq %rdx, 48(%rcx) +0x3c0 movq %rdi, 40(%rcx) +0x3c4 movq %r9, 32(%rcx) +0x3c8 movq %r8, 24(%rcx) +0x3cc movq 16(%rax), %rdx +0x3d0 movq %rdx, 16(%rcx) +0x3d4 movq (%rax), %rdx +0x3d7 movq 8(%rax), %rax +0x3db movq %rax, 8(%rcx) +0x3df movq %rdx, (%rcx) +0x3e2 movq (%rbx), %rax +0x3e5 addq $16, %rbx +0x3e9 testq %rax, %rax +0x3ec jne useropcd2+0x3a0 +0x3ee addq $-16, %rbx +0x3f2 jmp useropcd2+0x3f8 +0x3f4 addq $8, %rbx +0x3f8 movq 8(%rbx), %rax +0x3fc testq %rax, %rax +0x3ff je useropcd2+0x42e +0x401 addq $24, %rbx +0x405 nopw %cs:(%rax,%rax) +0x410 movq -8(%rbx), %rcx +0x414 movq (%rax), %rdx +0x417 movq 8(%rax), %rax +0x41b movq %rax, 8(%rcx) +0x41f movq %rdx, (%rcx) +0x422 movq (%rbx), %rax +0x425 addq $16, %rbx +0x429 testq %rax, %rax +0x42c jne useropcd2+0x410 +0x42e movq %r15, 2576(%r12) +0x436 cmpq $0, 48(%r14) +0x43b jne useropcd2+0x461 +0x43d movq 8(%r15), %rax +0x441 jmp useropcd2+0x45c +0x443 nopw %cs:(%rax,%rax) +0x450 movq %rax, 2576(%r12) +0x458 movq 8(%rax), %rax +0x45c testq %rax, %rax +0x45f jne useropcd2+0x450 +0x461 xorl %eax, %eax +0x463 popq %rbx +0x464 popq %r12 +0x466 popq %r14 +0x468 popq %r15 +0x46a popq %rbp +0x46b ret +0x46c nopl (%rax) On Fri, Aug 9, 2013 at 3:28 PM, Steven Yi |
Date | 2013-08-09 15:06 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
The difference here exists, but less so cs5: real 1m35.324s user 1m30.282s sys 0m3.417s cs6: real 1m59.254s user 1m57.559s sys 0m0.148s I have been looking at the tied notes example and trying to pin down where it slows significantly. So far, it appears that the lines aLeft = aout * irtl aRight = aout * irtr seem to affect the performance more than others. Also, something indicates that using more p-fieds also slows the code down (when we have a quick sequence of events like this). On 9 Aug 2013, at 14:28, Steven Yi wrote: > I've just tried running another example. Running to disk, with Csound6 I get: > > real 2m37.509s > user 2m35.736s > sys 0m0.452s > > with Csound5 I get: > > real 1m7.775s > user 1m4.561s > sys 0m0.392s > > This is with ksmps=1, but still, csound6 for this file take more than > double the time. I ran using: > > time csound clojure-piano-phase.csd -o t.wav > time csound64 clojure-piano-phase.csd -o t.wav > > I'm going to run this in XCode's instruments now. I'm also going to > recheck if there's a missing optimization flag compared to cs5. > > On Fri, Aug 9, 2013 at 1:39 PM, jpff |
Date | 2013-08-09 15:08 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
You have to be careful with using local PDS because I think it will cause wrong jumps in kgotos etc., which refer to CS_PDS. I might be wrong, but it appears so. Victor On 9 Aug 2013, at 15:03, Steven Yi wrote: > Just to note, Instruments shows that useropcd2 is much larger in > instructions than in CS5. It looks like it's due to CS_PDS as John > mentioned it being more expensive. I'm trying out some things with > using a local OPDS assigned from CS_PDS, but the difference in > instructions is below. > > CS6: > > +0x00 pushq %rbp > +0x01 movq %rsp, %rbp > +0x04 pushq %r15 > +0x06 pushq %r14 > +0x08 pushq %r12 > +0x0a pushq %rbx > +0x0b movq %rsi, %r12 > +0x0e movq %rdi, %r14 > +0x11 movq 40(%r12), %rcx > +0x16 movq 48(%r12), %rax > +0x1b movq 192(%rcx), %r15 > +0x22 movq 12224(%r14), %rcx > +0x29 movq %rcx, 280(%rax) > +0x30 movq 48(%r12), %rax > +0x35 movq 12232(%r14), %rcx > +0x3c movq %rcx, 288(%rax) > +0x43 movq 40(%r12), %rax > +0x48 movq 48(%r12), %rcx > +0x4d movq 8(%rcx), %rcx > +0x51 movq %rcx, 192(%rax) > +0x58 testq %rcx, %rcx > +0x5b je useropcd2+0x576 > +0x61 movq 48(%r12), %rax > +0x66 movq 56(%r12), %rcx > +0x6b movb 116(%rcx), %cl > +0x6e movb %cl, 116(%rax) > +0x71 movq 40(%r12), %rcx > +0x76 movq 64(%r12), %rbx > +0x7b movq 24(%rbx), %rax > +0x7f addq $24, %rbx > +0x83 cmpl $1, 156(%rcx) > +0x8a je useropcd2+0x114 > +0x90 testq %rax, %rax > +0x93 je useropcd2+0xd1 > +0x95 movl 2680(%r14), %ecx > +0x9c nopl (%rax) > +0xa0 movq 8(%rbx), %rdx > +0xa4 movl %ecx, %esi > +0xa6 nopw %cs:(%rax,%rax) > +0xb0 movsd (%rax), %xmm0 > +0xb4 movsd %xmm0, (%rdx) > +0xb8 addq $8, %rdx > +0xbc addq $8, %rax > +0xc0 decl %esi > +0xc2 jne useropcd2+0xb0 > +0xc4 movq 16(%rbx), %rax > +0xc8 addq $16, %rbx > +0xcc testq %rax, %rax > +0xcf jne useropcd2+0xa0 > +0xd1 movq 8(%rbx), %rax > +0xd5 testq %rax, %rax > +0xd8 je useropcd2+0x151 > +0xda addq $24, %rbx > +0xde nop > +0xe0 movq -8(%rbx), %rcx > +0xe4 movsd (%rax), %xmm0 > +0xe8 movsd %xmm0, (%rcx) > +0xec movq (%rbx), %rax > +0xef addq $16, %rbx > +0xf3 testq %rax, %rax > +0xf6 jne useropcd2+0xe0 > +0xf8 addq $-16, %rbx > +0xfc jmp useropcd2+0x155 > +0xfe nop > +0x100 movq 8(%rbx), %rcx > +0x104 movsd (%rax), %xmm0 > +0x108 movsd %xmm0, (%rcx) > +0x10c movq 16(%rbx), %rax > +0x110 addq $16, %rbx > +0x114 testq %rax, %rax > +0x117 jne useropcd2+0x100 > +0x119 movq 8(%rbx), %rax > +0x11d testq %rax, %rax > +0x120 je useropcd2+0x302 > +0x126 addq $24, %rbx > +0x12a nopw (%rax,%rax) > +0x130 movq -8(%rbx), %rcx > +0x134 movsd (%rax), %xmm0 > +0x138 movsd %xmm0, (%rcx) > +0x13c movq (%rbx), %rax > +0x13f addq $16, %rbx > +0x143 testq %rax, %rax > +0x146 jne useropcd2+0x130 > +0x148 addq $-16, %rbx > +0x14c jmpq useropcd2+0x306 > +0x151 addq $8, %rbx > +0x155 movq 8(%rbx), %rax > +0x159 testq %rax, %rax > +0x15c je useropcd2+0x1c4 > +0x15e addq $24, %rbx > +0x162 nopw %cs:(%rax,%rax) > +0x170 movq -8(%rbx), %rcx > +0x174 movq 24(%rax), %r8 > +0x178 movq 32(%rax), %r9 > +0x17c movq 40(%rax), %rdi > +0x180 movq 48(%rax), %rdx > +0x184 movq 56(%rax), %rsi > +0x188 movq %rsi, 56(%rcx) > +0x18c movq %rdx, 48(%rcx) > +0x190 movq %rdi, 40(%rcx) > +0x194 movq %r9, 32(%rcx) > +0x198 movq %r8, 24(%rcx) > +0x19c movq 16(%rax), %rdx > +0x1a0 movq %rdx, 16(%rcx) > +0x1a4 movq (%rax), %rdx > +0x1a7 movq 8(%rax), %rax > +0x1ab movq %rax, 8(%rcx) > +0x1af movq %rdx, (%rcx) > +0x1b2 movq (%rbx), %rax > +0x1b5 addq $16, %rbx > +0x1b9 testq %rax, %rax > +0x1bc jne useropcd2+0x170 > +0x1be addq $-16, %rbx > +0x1c2 jmp useropcd2+0x1c8 > +0x1c4 addq $8, %rbx > +0x1c8 movq 8(%rbx), %rax > +0x1cc testq %rax, %rax > +0x1cf je useropcd2+0x21c > +0x1d1 addq $24, %rbx > +0x1d5 nopw %cs:(%rax,%rax) > +0x1e0 movq -8(%rbx), %rcx > +0x1e4 movq 24(%rax), %rdx > +0x1e8 movq 32(%rax), %rsi > +0x1ec movq %rsi, 32(%rcx) > +0x1f0 movq %rdx, 24(%rcx) > +0x1f4 movq 16(%rax), %rdx > +0x1f8 movq %rdx, 16(%rcx) > +0x1fc movq (%rax), %rdx > +0x1ff movq 8(%rax), %rax > +0x203 movq %rax, 8(%rcx) > +0x207 movq %rdx, (%rcx) > +0x20a movq (%rbx), %rax > +0x20d addq $16, %rbx > +0x211 testq %rax, %rax > +0x214 jne useropcd2+0x1e0 > +0x216 addq $-16, %rbx > +0x21a jmp useropcd2+0x220 > +0x21c addq $8, %rbx > +0x220 movq 40(%r12), %rax > +0x225 movq 192(%rax), %rax > +0x22c movq 40(%rax), %rax > +0x230 movq $0, 192(%rax) > +0x23b nopl (%rax,%rax) > +0x240 movq 40(%r12), %rax > +0x245 movq 192(%rax), %rsi > +0x24c movq %r14, %rdi > +0x24f callq *24(%rsi) > +0x252 movq 40(%r12), %rax > +0x257 movq 192(%rax), %rcx > +0x25e movq 40(%rcx), %rdx > +0x262 movq 192(%rdx), %rdx > +0x269 testq %rdx, %rdx > +0x26c je useropcd2+0x29c > +0x26e movq %rdx, 192(%rax) > +0x275 movq 40(%r12), %rax > +0x27a movq 192(%rax), %rax > +0x281 movq 40(%rax), %rax > +0x285 movq $0, 192(%rax) > +0x290 movq 40(%r12), %rax > +0x295 movq 192(%rax), %rcx > +0x29c movq 8(%rcx), %rcx > +0x2a0 movq %rcx, 192(%rax) > +0x2a7 testq %rcx, %rcx > +0x2aa jne useropcd2+0x240 > +0x2ac movq 8(%rbx), %rax > +0x2b0 testq %rax, %rax > +0x2b3 je useropcd2+0x2f9 > +0x2b5 movl 2680(%r14), %ecx > +0x2bc nopl (%rax) > +0x2c0 movq %rbx, %rdx > +0x2c3 movq 16(%rdx), %rsi > +0x2c7 leaq 16(%rdx), %rbx > +0x2cb movl %ecx, %edi > +0x2cd nopl (%rax) > +0x2d0 movsd (%rax), %xmm0 > +0x2d4 movsd %xmm0, (%rsi) > +0x2d8 addq $8, %rsi > +0x2dc addq $8, %rax > +0x2e0 decl %edi > +0x2e2 jne useropcd2+0x2d0 > +0x2e4 movq 24(%rdx), %rax > +0x2e8 testq %rax, %rax > +0x2eb jne useropcd2+0x2c0 > +0x2ed addq $24, %rdx > +0x2f1 movq %rdx, %rbx > +0x2f4 jmpq useropcd2+0x492 > +0x2f9 addq $8, %rbx > +0x2fd jmpq useropcd2+0x492 > +0x302 addq $8, %rbx > +0x306 movq 8(%rbx), %rax > +0x30a testq %rax, %rax > +0x30d je useropcd2+0x374 > +0x30f addq $24, %rbx > +0x313 nopw %cs:(%rax,%rax) > +0x320 movq -8(%rbx), %rcx > +0x324 movq 24(%rax), %r8 > +0x328 movq 32(%rax), %r9 > +0x32c movq 40(%rax), %rdi > +0x330 movq 48(%rax), %rdx > +0x334 movq 56(%rax), %rsi > +0x338 movq %rsi, 56(%rcx) > +0x33c movq %rdx, 48(%rcx) > +0x340 movq %rdi, 40(%rcx) > +0x344 movq %r9, 32(%rcx) > +0x348 movq %r8, 24(%rcx) > +0x34c movq 16(%rax), %rdx > +0x350 movq %rdx, 16(%rcx) > +0x354 movq (%rax), %rdx > +0x357 movq 8(%rax), %rax > +0x35b movq %rax, 8(%rcx) > +0x35f movq %rdx, (%rcx) > +0x362 movq (%rbx), %rax > +0x365 addq $16, %rbx > +0x369 testq %rax, %rax > +0x36c jne useropcd2+0x320 > +0x36e addq $-16, %rbx > +0x372 jmp useropcd2+0x378 > +0x374 addq $8, %rbx > +0x378 movq 8(%rbx), %rax > +0x37c testq %rax, %rax > +0x37f je useropcd2+0x3cc > +0x381 addq $24, %rbx > +0x385 nopw %cs:(%rax,%rax) > +0x390 movq -8(%rbx), %rcx > +0x394 movq 24(%rax), %rdx > +0x398 movq 32(%rax), %rsi > +0x39c movq %rsi, 32(%rcx) > +0x3a0 movq %rdx, 24(%rcx) > +0x3a4 movq 16(%rax), %rdx > +0x3a8 movq %rdx, 16(%rcx) > +0x3ac movq (%rax), %rdx > +0x3af movq 8(%rax), %rax > +0x3b3 movq %rax, 8(%rcx) > +0x3b7 movq %rdx, (%rcx) > +0x3ba movq (%rbx), %rax > +0x3bd addq $16, %rbx > +0x3c1 testq %rax, %rax > +0x3c4 jne useropcd2+0x390 > +0x3c6 addq $-16, %rbx > +0x3ca jmp useropcd2+0x3d0 > +0x3cc addq $8, %rbx > +0x3d0 movq 40(%r12), %rax > +0x3d5 movq 192(%rax), %rax > +0x3dc movq 40(%rax), %rax > +0x3e0 movq $0, 192(%rax) > +0x3eb nopl (%rax,%rax) > +0x3f0 movq 40(%r12), %rax > +0x3f5 movq 192(%rax), %rsi > +0x3fc movq %r14, %rdi > +0x3ff callq *24(%rsi) > +0x402 movq 40(%r12), %rax > +0x407 movq 192(%rax), %rcx > +0x40e movq 40(%rcx), %rdx > +0x412 movq 192(%rdx), %rdx > +0x419 testq %rdx, %rdx > +0x41c je useropcd2+0x44c > +0x41e movq %rdx, 192(%rax) > +0x425 movq 40(%r12), %rax > +0x42a movq 192(%rax), %rax > +0x431 movq 40(%rax), %rax > +0x435 movq $0, 192(%rax) > +0x440 movq 40(%r12), %rax > +0x445 movq 192(%rax), %rcx > +0x44c movq 8(%rcx), %rcx > +0x450 movq %rcx, 192(%rax) > +0x457 testq %rcx, %rcx > +0x45a jne useropcd2+0x3f0 > +0x45c movq 8(%rbx), %rax > +0x460 testq %rax, %rax > +0x463 je useropcd2+0x48e > +0x465 addq $24, %rbx > +0x469 nopl (%rax) > +0x470 movq -8(%rbx), %rcx > +0x474 movsd (%rax), %xmm0 > +0x478 movsd %xmm0, (%rcx) > +0x47c movq (%rbx), %rax > +0x47f addq $16, %rbx > +0x483 testq %rax, %rax > +0x486 jne useropcd2+0x470 > +0x488 addq $-16, %rbx > +0x48c jmp useropcd2+0x492 > +0x48e addq $8, %rbx > +0x492 movq 8(%rbx), %rax > +0x496 testq %rax, %rax > +0x499 je useropcd2+0x4be > +0x49b addq $24, %rbx > +0x49f nop > +0x4a0 movq -8(%rbx), %rcx > +0x4a4 movsd (%rax), %xmm0 > +0x4a8 movsd %xmm0, (%rcx) > +0x4ac movq (%rbx), %rax > +0x4af addq $16, %rbx > +0x4b3 testq %rax, %rax > +0x4b6 jne useropcd2+0x4a0 > +0x4b8 addq $-16, %rbx > +0x4bc jmp useropcd2+0x4c2 > +0x4be addq $8, %rbx > +0x4c2 movq 8(%rbx), %rax > +0x4c6 testq %rax, %rax > +0x4c9 je useropcd2+0x524 > +0x4cb addq $24, %rbx > +0x4cf nop > +0x4d0 movq -8(%rbx), %rcx > +0x4d4 movq 24(%rax), %r8 > +0x4d8 movq 32(%rax), %r9 > +0x4dc movq 40(%rax), %rdi > +0x4e0 movq 48(%rax), %rdx > +0x4e4 movq 56(%rax), %rsi > +0x4e8 movq %rsi, 56(%rcx) > +0x4ec movq %rdx, 48(%rcx) > +0x4f0 movq %rdi, 40(%rcx) > +0x4f4 movq %r9, 32(%rcx) > +0x4f8 movq %r8, 24(%rcx) > +0x4fc movq 16(%rax), %rdx > +0x500 movq %rdx, 16(%rcx) > +0x504 movq (%rax), %rdx > +0x507 movq 8(%rax), %rax > +0x50b movq %rax, 8(%rcx) > +0x50f movq %rdx, (%rcx) > +0x512 movq (%rbx), %rax > +0x515 addq $16, %rbx > +0x519 testq %rax, %rax > +0x51c jne useropcd2+0x4d0 > +0x51e addq $-16, %rbx > +0x522 jmp useropcd2+0x528 > +0x524 addq $8, %rbx > +0x528 movq 8(%rbx), %rax > +0x52c testq %rax, %rax > +0x52f je useropcd2+0x576 > +0x531 addq $24, %rbx > +0x535 nopw %cs:(%rax,%rax) > +0x540 movq -8(%rbx), %rcx > +0x544 movq 24(%rax), %rdx > +0x548 movq 32(%rax), %rsi > +0x54c movq %rsi, 32(%rcx) > +0x550 movq %rdx, 24(%rcx) > +0x554 movq 16(%rax), %rdx > +0x558 movq %rdx, 16(%rcx) > +0x55c movq (%rax), %rdx > +0x55f movq 8(%rax), %rax > +0x563 movq %rax, 8(%rcx) > +0x567 movq %rdx, (%rcx) > +0x56a movq (%rbx), %rax > +0x56d addq $16, %rbx > +0x571 testq %rax, %rax > +0x574 jne useropcd2+0x540 > +0x576 movq 40(%r12), %rax > +0x57b movq %r15, 192(%rax) > +0x582 cmpq $0, 48(%r12) > +0x588 jne useropcd2+0x5ce > +0x58a movq 40(%r12), %rax > +0x58f movq 192(%rax), %rcx > +0x596 movq 8(%rcx), %rcx > +0x59a testq %rcx, %rcx > +0x59d je useropcd2+0x5ce > +0x59f addq $192, %rax > +0x5a5 nopw %cs:(%rax,%rax) > +0x5b0 movq %rcx, (%rax) > +0x5b3 movq 40(%r12), %rax > +0x5b8 movq 192(%rax), %rcx > +0x5bf addq $192, %rax > +0x5c5 movq 8(%rcx), %rcx > +0x5c9 testq %rcx, %rcx > +0x5cc jne useropcd2+0x5b0 > +0x5ce xorl %eax, %eax > +0x5d0 popq %rbx > +0x5d1 popq %r12 > +0x5d3 popq %r14 > +0x5d5 popq %r15 > +0x5d7 popq %rbp > +0x5d8 ret > +0x5d9 nopl (%rax) > > CS5: > > +0x00 pushq %rbp > +0x01 movq %rsp, %rbp > +0x04 pushq %r15 > +0x06 pushq %r14 > +0x08 pushq %r12 > +0x0a pushq %rbx > +0x0b movq %rsi, %r14 > +0x0e movq %rdi, %r12 > +0x11 movq 2576(%r12), %r15 > +0x19 movq 48(%r14), %rax > +0x1d movq 8(%rax), %rax > +0x21 movq %rax, 2576(%r12) > +0x29 testq %rax, %rax > +0x2c je useropcd2+0x42e > +0x32 movq 48(%r14), %rax > +0x36 movq 56(%r14), %rcx > +0x3a movb 102(%rcx), %cl > +0x3d movb %cl, 102(%rax) > +0x40 movq 64(%r14), %rbx > +0x44 movq 24(%rbx), %rax > +0x48 addq $24, %rbx > +0x4c movl 2584(%r12), %ecx > +0x54 cmpl $1, %ecx > +0x57 je useropcd2+0x74 > +0x59 jmp useropcd2+0xb8 > +0x5b nopl (%rax,%rax) > +0x60 movq 8(%rbx), %rcx > +0x64 movsd (%rax), %xmm0 > +0x68 movsd %xmm0, (%rcx) > +0x6c movq 16(%rbx), %rax > +0x70 addq $16, %rbx > +0x74 testq %rax, %rax > +0x77 jne useropcd2+0x60 > +0x79 movq 8(%rbx), %rax > +0x7d testq %rax, %rax > +0x80 je useropcd2+0x11e > +0x86 addq $24, %rbx > +0x8a nopw (%rax,%rax) > +0x90 movq -8(%rbx), %rcx > +0x94 movsd (%rax), %xmm0 > +0x98 movsd %xmm0, (%rcx) > +0x9c movq (%rbx), %rax > +0x9f addq $16, %rbx > +0xa3 testq %rax, %rax > +0xa6 jne useropcd2+0x90 > +0xa8 addq $-16, %rbx > +0xac jmp useropcd2+0x122 > +0xae nop > +0xb0 movq 16(%rbx), %rax > +0xb4 addq $16, %rbx > +0xb8 testq %rax, %rax > +0xbb je useropcd2+0xe6 > +0xbd movq 8(%rbx), %rdx > +0xc1 movl %ecx, %esi > +0xc3 nopw %cs:(%rax,%rax) > +0xd0 movsd (%rax), %xmm0 > +0xd4 movsd %xmm0, (%rdx) > +0xd8 addq $8, %rdx > +0xdc addq $8, %rax > +0xe0 decl %esi > +0xe2 jne useropcd2+0xd0 > +0xe4 jmp useropcd2+0xb0 > +0xe6 movq 8(%rbx), %rax > +0xea testq %rax, %rax > +0xed je useropcd2+0x194 > +0xf3 addq $24, %rbx > +0xf7 nopw (%rax,%rax) > +0x100 movq -8(%rbx), %rcx > +0x104 movsd (%rax), %xmm0 > +0x108 movsd %xmm0, (%rcx) > +0x10c movq (%rbx), %rax > +0x10f addq $16, %rbx > +0x113 testq %rax, %rax > +0x116 jne useropcd2+0x100 > +0x118 addq $-16, %rbx > +0x11c jmp useropcd2+0x198 > +0x11e addq $8, %rbx > +0x122 movq 8(%rbx), %rax > +0x126 testq %rax, %rax > +0x129 je useropcd2+0x204 > +0x12f addq $24, %rbx > +0x133 nopw %cs:(%rax,%rax) > +0x140 movq -8(%rbx), %rcx > +0x144 movq 24(%rax), %r8 > +0x148 movq 32(%rax), %r9 > +0x14c movq 40(%rax), %rdi > +0x150 movq 48(%rax), %rdx > +0x154 movq 56(%rax), %rsi > +0x158 movq %rsi, 56(%rcx) > +0x15c movq %rdx, 48(%rcx) > +0x160 movq %rdi, 40(%rcx) > +0x164 movq %r9, 32(%rcx) > +0x168 movq %r8, 24(%rcx) > +0x16c movq 16(%rax), %rdx > +0x170 movq %rdx, 16(%rcx) > +0x174 movq (%rax), %rdx > +0x177 movq 8(%rax), %rax > +0x17b movq %rax, 8(%rcx) > +0x17f movq %rdx, (%rcx) > +0x182 movq (%rbx), %rax > +0x185 addq $16, %rbx > +0x189 testq %rax, %rax > +0x18c jne useropcd2+0x140 > +0x18e addq $-16, %rbx > +0x192 jmp useropcd2+0x208 > +0x194 addq $8, %rbx > +0x198 movq 8(%rbx), %rax > +0x19c testq %rax, %rax > +0x19f je useropcd2+0x244 > +0x1a5 addq $24, %rbx > +0x1a9 nopl (%rax) > +0x1b0 movq -8(%rbx), %rcx > +0x1b4 movq 24(%rax), %r8 > +0x1b8 movq 32(%rax), %r9 > +0x1bc movq 40(%rax), %rdi > +0x1c0 movq 48(%rax), %rdx > +0x1c4 movq 56(%rax), %rsi > +0x1c8 movq %rsi, 56(%rcx) > +0x1cc movq %rdx, 48(%rcx) > +0x1d0 movq %rdi, 40(%rcx) > +0x1d4 movq %r9, 32(%rcx) > +0x1d8 movq %r8, 24(%rcx) > +0x1dc movq 16(%rax), %rdx > +0x1e0 movq %rdx, 16(%rcx) > +0x1e4 movq (%rax), %rdx > +0x1e7 movq 8(%rax), %rax > +0x1eb movq %rax, 8(%rcx) > +0x1ef movq %rdx, (%rcx) > +0x1f2 movq (%rbx), %rax > +0x1f5 addq $16, %rbx > +0x1f9 testq %rax, %rax > +0x1fc jne useropcd2+0x1b0 > +0x1fe addq $-16, %rbx > +0x202 jmp useropcd2+0x248 > +0x204 addq $8, %rbx > +0x208 movq 8(%rbx), %rax > +0x20c testq %rax, %rax > +0x20f je useropcd2+0x284 > +0x211 addq $24, %rbx > +0x215 nopw %cs:(%rax,%rax) > +0x220 movq -8(%rbx), %rcx > +0x224 movq (%rax), %rdx > +0x227 movq 8(%rax), %rax > +0x22b movq %rax, 8(%rcx) > +0x22f movq %rdx, (%rcx) > +0x232 movq (%rbx), %rax > +0x235 addq $16, %rbx > +0x239 testq %rax, %rax > +0x23c jne useropcd2+0x220 > +0x23e addq $-16, %rbx > +0x242 jmp useropcd2+0x288 > +0x244 addq $8, %rbx > +0x248 movq 8(%rbx), %rax > +0x24c testq %rax, %rax > +0x24f je useropcd2+0x2e1 > +0x255 addq $24, %rbx > +0x259 nopl (%rax) > +0x260 movq -8(%rbx), %rcx > +0x264 movq (%rax), %rdx > +0x267 movq 8(%rax), %rax > +0x26b movq %rax, 8(%rcx) > +0x26f movq %rdx, (%rcx) > +0x272 movq (%rbx), %rax > +0x275 addq $16, %rbx > +0x279 testq %rax, %rax > +0x27c jne useropcd2+0x260 > +0x27e addq $-16, %rbx > +0x282 jmp useropcd2+0x2e5 > +0x284 addq $8, %rbx > +0x288 movq 2576(%r12), %rsi > +0x290 movq %r12, %rdi > +0x293 callq *24(%rsi) > +0x296 movq 2576(%r12), %rax > +0x29e movq 8(%rax), %rsi > +0x2a2 movq %rsi, 2576(%r12) > +0x2aa testq %rsi, %rsi > +0x2ad jne useropcd2+0x290 > +0x2af movq 8(%rbx), %rax > +0x2b3 testq %rax, %rax > +0x2b6 je useropcd2+0x356 > +0x2bc addq $24, %rbx > +0x2c0 movq -8(%rbx), %rcx > +0x2c4 movsd (%rax), %xmm0 > +0x2c8 movsd %xmm0, (%rcx) > +0x2cc movq (%rbx), %rax > +0x2cf addq $16, %rbx > +0x2d3 testq %rax, %rax > +0x2d6 jne useropcd2+0x2c0 > +0x2d8 addq $-16, %rbx > +0x2dc jmpq useropcd2+0x360 > +0x2e1 addq $8, %rbx > +0x2e5 movq 2576(%r12), %rsi > +0x2ed nopl (%rax) > +0x2f0 movq %r12, %rdi > +0x2f3 callq *24(%rsi) > +0x2f6 movq 2576(%r12), %rax > +0x2fe movq 8(%rax), %rsi > +0x302 movq %rsi, 2576(%r12) > +0x30a testq %rsi, %rsi > +0x30d jne useropcd2+0x2f0 > +0x30f movq 8(%rbx), %rax > +0x313 testq %rax, %rax > +0x316 je useropcd2+0x35c > +0x318 movl 2584(%r12), %ecx > +0x320 movq %rbx, %rdx > +0x323 movq 16(%rdx), %rsi > +0x327 leaq 16(%rdx), %rbx > +0x32b movl %ecx, %edi > +0x32d nopl (%rax) > +0x330 movsd (%rax), %xmm0 > +0x334 movsd %xmm0, (%rsi) > +0x338 addq $8, %rsi > +0x33c addq $8, %rax > +0x340 decl %edi > +0x342 jne useropcd2+0x330 > +0x344 movq 24(%rdx), %rax > +0x348 testq %rax, %rax > +0x34b jne useropcd2+0x320 > +0x34d addq $24, %rdx > +0x351 movq %rdx, %rbx > +0x354 jmp useropcd2+0x360 > +0x356 addq $8, %rbx > +0x35a jmp useropcd2+0x360 > +0x35c addq $8, %rbx > +0x360 movq 8(%rbx), %rax > +0x364 testq %rax, %rax > +0x367 je useropcd2+0x38e > +0x369 addq $24, %rbx > +0x36d nopl (%rax) > +0x370 movq -8(%rbx), %rcx > +0x374 movsd (%rax), %xmm0 > +0x378 movsd %xmm0, (%rcx) > +0x37c movq (%rbx), %rax > +0x37f addq $16, %rbx > +0x383 testq %rax, %rax > +0x386 jne useropcd2+0x370 > +0x388 addq $-16, %rbx > +0x38c jmp useropcd2+0x392 > +0x38e addq $8, %rbx > +0x392 movq 8(%rbx), %rax > +0x396 testq %rax, %rax > +0x399 je useropcd2+0x3f4 > +0x39b addq $24, %rbx > +0x39f nop > +0x3a0 movq -8(%rbx), %rcx > +0x3a4 movq 24(%rax), %r8 > +0x3a8 movq 32(%rax), %r9 > +0x3ac movq 40(%rax), %rdi > +0x3b0 movq 48(%rax), %rdx > +0x3b4 movq 56(%rax), %rsi > +0x3b8 movq %rsi, 56(%rcx) > +0x3bc movq %rdx, 48(%rcx) > +0x3c0 movq %rdi, 40(%rcx) > +0x3c4 movq %r9, 32(%rcx) > +0x3c8 movq %r8, 24(%rcx) > +0x3cc movq 16(%rax), %rdx > +0x3d0 movq %rdx, 16(%rcx) > +0x3d4 movq (%rax), %rdx > +0x3d7 movq 8(%rax), %rax > +0x3db movq %rax, 8(%rcx) > +0x3df movq %rdx, (%rcx) > +0x3e2 movq (%rbx), %rax > +0x3e5 addq $16, %rbx > +0x3e9 testq %rax, %rax > +0x3ec jne useropcd2+0x3a0 > +0x3ee addq $-16, %rbx > +0x3f2 jmp useropcd2+0x3f8 > +0x3f4 addq $8, %rbx > +0x3f8 movq 8(%rbx), %rax > +0x3fc testq %rax, %rax > +0x3ff je useropcd2+0x42e > +0x401 addq $24, %rbx > +0x405 nopw %cs:(%rax,%rax) > +0x410 movq -8(%rbx), %rcx > +0x414 movq (%rax), %rdx > +0x417 movq 8(%rax), %rax > +0x41b movq %rax, 8(%rcx) > +0x41f movq %rdx, (%rcx) > +0x422 movq (%rbx), %rax > +0x425 addq $16, %rbx > +0x429 testq %rax, %rax > +0x42c jne useropcd2+0x410 > +0x42e movq %r15, 2576(%r12) > +0x436 cmpq $0, 48(%r14) > +0x43b jne useropcd2+0x461 > +0x43d movq 8(%r15), %rax > +0x441 jmp useropcd2+0x45c > +0x443 nopw %cs:(%rax,%rax) > +0x450 movq %rax, 2576(%r12) > +0x458 movq 8(%rax), %rax > +0x45c testq %rax, %rax > +0x45f jne useropcd2+0x450 > +0x461 xorl %eax, %eax > +0x463 popq %rbx > +0x464 popq %r12 > +0x466 popq %r14 > +0x468 popq %r15 > +0x46a popq %rbp > +0x46b ret > +0x46c nopl (%rax) > > > > On Fri, Aug 9, 2013 at 3:28 PM, Steven Yi |
Date | 2013-08-09 15:14 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Also, I tried this |
Date | 2013-08-09 15:15 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I tried modifying CS_PDS usage but it seemed to have no effect really. Oddly enough, I get a faster render when I'm profiling with Instruments than I do with the Csound version I compiled with make and run on the commandline. I'll abandon CS_PDS and keep looking elsewhere. On Fri, Aug 9, 2013 at 4:08 PM, Victor Lazzarini |
Date | 2013-08-09 15:32 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Odd, I tried this last example and CS5 gave: real 0m0.510s user 0m0.364s sys 0m0.037s and CS6 gave: real 0m1.361s user 0m1.316s sys 0m0.041s I ran both multiple times, and if anything, the CS5 result was on the high side and the CS6 on the lower side of the different runs. Also to note, I tried compiling with gcc instead of clang, but that that made no difference. On Fri, Aug 9, 2013 at 4:14 PM, Victor Lazzarini |
Date | 2013-08-09 15:39 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
It appears my csound 5 is not as good as yours ;) . Anyway, since you have a fast computer, the cs6 should have been faster there than here. However, your other results were kind of consistent with what I was getting here, albeit with csound 5 a little slower (which you would expect since your computer is newer), but csound 6 was pretty much similar. I am using the release csound64 (5.19). Victor On 9 Aug 2013, at 15:32, Steven Yi wrote: > Odd, I tried this last example and CS5 gave: > > real 0m0.510s > user 0m0.364s > sys 0m0.037s > > and CS6 gave: > > real 0m1.361s > user 0m1.316s > sys 0m0.041s > > > I ran both multiple times, and if anything, the CS5 result was on the > high side and the CS6 on the lower side of the different runs. > > Also to note, I tried compiling with gcc instead of clang, but that > that made no difference. > > On Fri, Aug 9, 2013 at 4:14 PM, Victor Lazzarini > |
Date | 2013-08-09 16:03 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Yeah, it's tricky too as we're on different versions of OSX as well. It's crazy that nothing's really sticking out yet. On Fri, Aug 9, 2013 at 4:39 PM, Victor Lazzarini |
Date | 2013-08-09 16:07 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
As I'm sure you're aware, you need to control for: (a) physical computer architecture (CPU mode/instruction set/memory speed etc.), (b) compiler target architecture (e.g. -mnative or -mnocona or whatever), and (c) compiler optimization flags. Obviously you are running on different CPUs/memory speeds but are you using a common CPU target architecture, a common set of compiler flags? What are they? Regards, Mike
=========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 11:03 AM, Steven Yi <stevenyi@gmail.com> wrote: Yeah, it's tricky too as we're on different versions of OSX as well. |
Date | 2013-08-09 16:26 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | tied.csd None None |
Here, I've noticed that statements of the kind a1 = a2 * i2 seem to produce slower code. I've rewritten these in the tied notes example (see attached), and the performance is very closely matching csound 5 csound64 tied.csd -+skip_seconds=36 -dm128 ... Elapsed time at end of performance: real: 6.551s, CPU: 6.487s csound tied.csd -+skip_seconds=36 -dm128 ... Elapsed time at end of performance: real: 6.998s, CPU: 6.994s On 9 Aug 2013, at 16:03, Steven Yi wrote: > Yeah, it's tricky too as we're on different versions of OSX as well. > It's crazy that nothing's really sticking out yet. > > On Fri, Aug 9, 2013 at 4:39 PM, Victor Lazzarini > |
Date | 2013-08-09 19:06 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
so a compiler issue Here, I've noticed that statements of the kind > > a1 = a2 * i2 > > seem to produce slower code. > I've rewritten these in the tied notes example (see attached), and the > performance is > very closely matching csound 5 > > csound64 tied.csd -+skip_seconds=36 -dm128 > ... > Elapsed time at end of performance: real: 6.551s, CPU: 6.487s > > csound tied.csd -+skip_seconds=36 -dm128 > ... > Elapsed time at end of performance: real: 6.998s, CPU: 6.994s > > > > > > > > On 9 Aug 2013, at 16:03, Steven Yi wrote: > >> Yeah, it's tricky too as we're on different versions of OSX as well. >> It's crazy that nothing's really sticking out yet. >> >> On Fri, Aug 9, 2013 at 4:39 PM, Victor Lazzarini >> |
Date | 2013-08-09 19:06 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
so a compiler issue Here, I've noticed that statements of the kind > > a1 = a2 * i2 > > seem to produce slower code. > I've rewritten these in the tied notes example (see attached), and the > performance is > very closely matching csound 5 > > csound64 tied.csd -+skip_seconds=36 -dm128 > ... > Elapsed time at end of performance: real: 6.551s, CPU: 6.487s > > csound tied.csd -+skip_seconds=36 -dm128 > ... > Elapsed time at end of performance: real: 6.998s, CPU: 6.994s > > > > > > > > On 9 Aug 2013, at 16:03, Steven Yi wrote: > >> Yeah, it's tricky too as we're on different versions of OSX as well. >> It's crazy that nothing's really sticking out yet. >> >> On Fri, Aug 9, 2013 at 4:39 PM, Victor Lazzarini >> |
Date | 2013-08-09 19:32 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
Can you please tell me, what are your compiler flags for these tests? Regards, Mike ===========================
Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 2:06 PM, <jpff@cs.bath.ac.uk> wrote: so a compiler issue |
Date | 2013-08-09 19:53 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I am just building Csound with the standard build options set by cmake (plus debug symbols). Victor On 9 Aug 2013, at 19:32, Michael Gogins wrote: > Can you please tell me, what are your compiler flags for these tests? > > Regards, > Mike > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 2:06 PM, |
Date | 2013-08-09 20:33 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
As far as I know there is no architecture assumption in CMake's default C and C++ compiler flags. However, you can see all the toolchain command lines and flags by running "make VERBOSE=1". On Windows, I always build with "-march=nocona -O2 -g" as experience has shown that this gives very good performance but also is old enough to support most users' CPUs. If you are on an Intel Mac you also should be able to use -march=nocona.
The performance difference for the different CPU architectures can be rather significant, because of the vectorizing extensions to later architectures. At the same time, you can't just assume that later -march options are always better. I get the feeling that different -march code generators received varying amounts of engineering by engineers of varying capability.
The default TARGET architecture for your gcc can be obtained by running "gcc -v". In general it seems to be a very minimal instruction set. One reason I am going on about this is that it is possible that the 5.19 binaries had different -march or -mtune than the 6.00 binaries. After all, we used SCons for 5 and we use CMake for 6. This could be a confounding variable.
I know that on Windows I used the same flags for 5.19 as I now use for 6.00. I will run the 2 test csds on 5.19 and 6.00 and let you know what I get. Regards, Mike
=========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 2:53 PM, Victor Lazzarini <Victor.Lazzarini@nuim.ie> wrote: I am just building Csound with the standard build options set by cmake (plus debug symbols). |
Date | 2013-08-09 20:36 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
I think it's also possible that SCons had -O2 by default but CMake has no optimization by default. This alone could easily explain what you see. =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 3:33 PM, Michael Gogins <michael.gogins@gmail.com> wrote:
|
Date | 2013-08-09 20:37 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
The architecture is x86_64, and the only difference with Csound 5 is that it is single-arch, not fat (x86_64, i386). I am pretty sure the flags are the correct ones. Victor On 9 Aug 2013, at 20:33, Michael Gogins wrote: > As far as I know there is no architecture assumption in CMake's default C and C++ compiler flags. However, you can see all the toolchain command lines and flags by running "make VERBOSE=1". > > On Windows, I always build with "-march=nocona -O2 -g" as experience has shown that this gives very good performance but also is old enough to support most users' CPUs. If you are on an Intel Mac you also should be able to use -march=nocona. > > The performance difference for the different CPU architectures can be rather significant, because of the vectorizing extensions to later architectures. > > At the same time, you can't just assume that later -march options are always better. I get the feeling that different -march code generators received varying amounts of engineering by engineers of varying capability. > > The default TARGET architecture for your gcc can be obtained by running "gcc -v". In general it seems to be a very minimal instruction set. > > One reason I am going on about this is that it is possible that the 5.19 binaries had different -march or -mtune than the 6.00 binaries. After all, we used SCons for 5 and we use CMake for 6. This could be a confounding variable. > > I know that on Windows I used the same flags for 5.19 as I now use for 6.00. I will run the 2 test csds on 5.19 and 6.00 and let you know what I get. > > Regards, > Mike > > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 2:53 PM, Victor Lazzarini |
Date | 2013-08-09 20:47 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
No, the optimisation is O3 as it was in Csound 5. The performance would be really worse if that wasn't the case. Victor On 9 Aug 2013, at 20:36, Michael Gogins wrote: > I think it's also possible that SCons had -O2 by default but CMake has no optimization by default. This alone could easily explain what you see. > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 3:33 PM, Michael Gogins |
Date | 2013-08-09 20:56 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
OK, thanks. x86_64 is a minimal instruction set. Do you explicitly then set added vectorization options? BTW, I wouldn't assume that O3 is actually faster than O2 without actually measuring it. Mike =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 3:47 PM, Victor Lazzarini <Victor.Lazzarini@nuim.ie> wrote: No, the optimisation is O3 as it was in Csound 5. The performance would be really worse if that wasn't the case. |
Date | 2013-08-09 22:14 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
There are some, but the main point is that the both csound 5 and 6 have been built with similar settings (03 and all). On 9 Aug 2013, at 20:56, Michael Gogins wrote: > OK, thanks. > > x86_64 is a minimal instruction set. Do you explicitly then set added vectorization options? > > BTW, I wouldn't assume that O3 is actually faster than O2 without actually measuring it. > > Mike > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 3:47 PM, Victor Lazzarini |
Date | 2013-08-10 01:04 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
From | Michael Gogins | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject | Re: [Cs-dev] Performance Issues with Csound6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Attachments | None None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OK,here's my data. What I said about using the same flags on both versions was wrong. I was right about -O2 g being faster than -O3 -g, however (figures not shown). Also, I mistakenly thought Csound 6 was faster than 5 because the examples for 6 are different than the examples for 5. All tests use exactly the same csds, either from Steven's emails or from the Csound 5 examples.
. =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Fri, Aug 9, 2013 at 5:14 PM, Victor Lazzarini <Victor.Lazzarini@nuim.ie> wrote: There are some, but the main point is that the both csound 5 and 6 have been built with similar settings (03 and all). |
Date | 2013-08-10 09:54 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Well, here I get a genuine speedup with a lot of examples and ksmps > 1. On 10 Aug 2013, at 01:04, Michael Gogins wrote: > OK,here's my data. What I said about using the same flags on both versions was wrong. I was right about -O2 g being faster than -O3 -g, however (figures not shown). > > Also, I mistakenly thought Csound 6 was faster than 5 because the examples for 6 are different than the examples for 5. All tests use exactly the same csds, either from Steven's emails or from the Csound 5 examples. > > > > > yi_tied_notes_example.csd -otest.wav clojure-csound-piano-phase.csd trapped.csd -otest.wav > Csound 5.19.02 Mingw32-4.7.1 -mtune=core2 -O3 -g 17.081 96.805 3.440 > Csound 5.19.02 Mingw32-4.7.1 -mtune=core2 -O3 -g 16.558 97.426 3.518 > Csound 5.19.02 Mingw32-4.7.1 -mtune=core2 -O3 -g 16.756 97.362 3.473 > Mean > > 16.798 97.198 3.477 > Csound 6.00.1 Mingw32-4.8.1 -march=nocona -O2 -g -NDEBUG 19.970 125.588 4.968 > Csound 6.00.1 Mingw32-4.8.1 -march=nocona -O2 -g -NDEBUG 19.511 126.034 4.906 > Csound 6.00.1 Mingw32-4.8.1 -march=nocona -O2 -g -NDEBUG 18.833 125.750 4.935 > Mean > > 19.438 125.791 4.936 > Csound 5/Csound 6 > > 0.864 0.773 0.704 > > . > > > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Fri, Aug 9, 2013 at 5:14 PM, Victor Lazzarini |
Date | 2013-08-10 18:26 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Hi Victor, With the tied.csd example, I get with CS6 (two renders): Elapsed time at end of performance: real: 9.426s, CPU: 9.399s Elapsed time at end of performance: real: 8.772s, CPU: 8.768s and with CS5 I get: Elapsed time at end of performance: real: 4.778s, CPU: 4.755s I just tried rebuilding Csound with -O2 and looks like I've gotten a bit of an improvement: Elapsed time at end of performance: real: 8.642s, CPU: 8.635s but it looks within the degree of variation that I'm seeing between different runs. I tried replacing the AK macro in aops.c with the one from cs5, but that didn't seem to produce much of a difference. I keep thinking there has got to be something else. The code for sample-accurate really doesn't look like much overhead, at least within opcodes. I'll keep trying out things here. On Sat, Aug 10, 2013 at 10:54 AM, Victor Lazzarini |
Date | 2013-08-10 21:15 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
Has anyone tried callgrind? Or gprof? Regards, MIke =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Sat, Aug 10, 2013 at 1:26 PM, Steven Yi <stevenyi@gmail.com> wrote: Hi Victor, |
Date | 2013-08-10 21:41 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Here, -O3 is better than-O2. Your csound 6 performance is much worse than the one I have here. I wonder why. I think it must be something else, in your system. Even the clojure example you showed here was giving me 120s vs 95s (cs6 vs. cs5). The timings you are getting for Csound 5 are on par with what I have here, considering you have a faster machine anyway. Victor On 10 Aug 2013, at 18:26, Steven Yi wrote: > Hi Victor, > > With the tied.csd example, I get with CS6 (two renders): > > Elapsed time at end of performance: real: 9.426s, CPU: 9.399s > Elapsed time at end of performance: real: 8.772s, CPU: 8.768s > > and with CS5 I get: > > Elapsed time at end of performance: real: 4.778s, CPU: 4.755s > > I just tried rebuilding Csound with -O2 and looks like I've gotten a > bit of an improvement: > > Elapsed time at end of performance: real: 8.642s, CPU: 8.635s > > but it looks within the degree of variation that I'm seeing between > different runs. I tried replacing the AK macro in aops.c with the one > from cs5, but that didn't seem to produce much of a difference. > > I keep thinking there has got to be something else. The code for > sample-accurate really doesn't look like much overhead, at least > within opcodes. I'll keep trying out things here. > > On Sat, Aug 10, 2013 at 10:54 AM, Victor Lazzarini > |
Date | 2013-08-11 00:26 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Well, I'm still unsure what is going on. However, with tied.csd, I retried replacing the KA, AK, and AA macro's in aops.c with the ones from CS5 and I got a consistently better speed. With the sample accurate stuff removed, I got: real 0m7.796s user 0m7.761s sys 0m0.031s (fluctuated between 7.6 and 7.8) and then recompiled with the sample accurate versions of those macros and got: real 0m8.464s user 0m8.431s sys 0m0.030s (fluctuated between 8.4 and 8.5). I think earlier I had just replaced one of the macros, but really should have tried replacing all of them to get a better test. With the clojure piano phase example, I got: real 2m27.453s user 2m26.607s sys 0m0.238s which is 10 seconds less than the previous time I posted. If I instead modify the AA, AK, and KA macros to check if sampleAccurate is on, similarly to how aassign does it, i.e.: #define AA(OPNAME,OP) \ int OPNAME(CSOUND *csound, AOP *p) { \ MYFLT *r, *a, *b; \ uint32_t n, nsmps = CS_KSMPS; \ r = p->r; \ a = p->a; \ b = p->b; \ if (UNLIKELY(csound->oparms->sampleAccurate)) { \ uint32_t offset = p->h.insdshead->ksmps_offset; \ uint32_t early = p->h.insdshead->ksmps_no_end; \ if (UNLIKELY(offset)) memset(r, '\0', offset*sizeof(MYFLT)); \ if (UNLIKELY(early)) { \ nsmps -= early; \ memset(&r[nsmps], '\0', early*sizeof(MYFLT)); \ } \ for (n=offset; n |
Date | 2013-08-11 00:51 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Steven, your results with the clojure code are way worse than the ones I am getting real 1m58.673s user 1m57.640s sys 0m0.323s so I am not really sure what is the matter with your build. Victor On 11 Aug 2013, at 00:26, Steven Yi wrote: > Well, I'm still unsure what is going on. However, with tied.csd, I > retried replacing the KA, AK, and AA macro's in aops.c with the ones > from CS5 and I got a consistently better speed. With the sample > accurate stuff removed, I got: > > real 0m7.796s > user 0m7.761s > sys 0m0.031s > > (fluctuated between 7.6 and 7.8) and then recompiled with the sample > accurate versions of those macros and got: > > real 0m8.464s > user 0m8.431s > sys 0m0.030s > > (fluctuated between 8.4 and 8.5). I think earlier I had just replaced > one of the macros, but really should have tried replacing all of them > to get a better test. > > With the clojure piano phase example, I got: > > real 2m27.453s > user 2m26.607s > sys 0m0.238s > > which is 10 seconds less than the previous time I posted. > > If I instead modify the AA, AK, and KA macros to check if > sampleAccurate is on, similarly to how aassign does it, i.e.: > > #define AA(OPNAME,OP) \ > int OPNAME(CSOUND *csound, AOP *p) { \ > MYFLT *r, *a, *b; \ > uint32_t n, nsmps = CS_KSMPS; \ > r = p->r; \ > a = p->a; \ > b = p->b; \ > if (UNLIKELY(csound->oparms->sampleAccurate)) { \ > uint32_t offset = p->h.insdshead->ksmps_offset; \ > uint32_t early = p->h.insdshead->ksmps_no_end; \ > if (UNLIKELY(offset)) memset(r, '\0', offset*sizeof(MYFLT)); \ > if (UNLIKELY(early)) { \ > nsmps -= early; \ > memset(&r[nsmps], '\0', early*sizeof(MYFLT)); \ > } \ > for (n=offset; n |
Date | 2013-08-11 01:21 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
What compiler versions are you using? =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Sat, Aug 10, 2013 at 7:51 PM, Victor Lazzarini <Victor.Lazzarini@nuim.ie> wrote: Steven, |
Date | 2013-08-11 07:59 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
-O3 -fPIC -ftree-vectorize -fvisibility=hidden -ffast-math -mfpmath=sse -fomit-frame-pointer On 11 Aug 2013, at 01:21, Michael Gogins wrote: > What compiler versions are you using? > > > =========================== > Michael Gogins > Irreducible Productions > http://michaelgogins.tumblr.com > Michael dot Gogins at gmail dot com > > > On Sat, Aug 10, 2013 at 7:51 PM, Victor Lazzarini |
Date | 2013-08-11 09:59 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
As mentioned earlier, there's definitely odd things going on. The debug build in XCode was runnning faster than the release version I had done on the commandline. However... I definitely found something! So the flags that are added in the cmake file with: if(CMAKE_C_COMPILER MATCHES "gcc") add_compiler_flags(${libcsound_CFLAGS} -fvisibility=hidden -ffast-math -mfpmath=sse -fomit-frame-pointer TARGETS ${CSOUNDLIB}) fail on this system as CMAKE_C_COMPILER is listed as "/usr/bin/cc" (which in turn calls clang). My guess now is that xcode itself added some optimizations that the commandline build did not. Some notes: -We should use CMAKE_COMPILER_IS_GNUCC instead of the matches above, or maybe better, use CMAKE_COMPILER_CC_ID, as CMAKE_COMPILER_IS_GNUCC fails for clang -We should really just check if the compile flag is supported using a test. The Custom.cmake.ex has some examples of testing c compiler flags. This is great, as at least now there's a big reason showing up as to why things were so slow. Now, I did a manual setting of flags. Turns out the the Release flags were not really being used even though I specified Release for the CMAKE_BUILD_TYPE. I manually added -O3 and others, and for tied.csd with CS6 I got: real 0m4.990s user 0m4.956s sys 0m0.030s and with CS5 I got: real 0m4.297s user 0m4.021s sys 0m0.030s much, much better. I see that the Custom.cmake.ex has set(CMAKE_BUILD_TYPE "Debug") in it. I'm going to look at revising the way compiler flags are checked and set. This should fix the issues reported for Raspberry Pi and SSE, as well as fix up the issues I'm having here. Question: were you all setting flags manually in Custom.cmake, or perhaps something else? I'll start a new thread about compiler flags once I finish changes. On Sun, Aug 11, 2013 at 1:51 AM, Victor Lazzarini |
Date | 2013-08-11 11:56 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
In my results I got 3.5 for Trapped for cs5, faster than you. Cs6 is the same. Flags and compiler versions were given. See my earlier email. On Aug 11, 2013 5:00 AM, "Steven Yi" <stevenyi@gmail.com> wrote:
As mentioned earlier, there's definitely odd things going on. The |
Date | 2013-08-11 12:00 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Sorry, not sure what this email means, as we haven't been using Trapped in this thread for testing. On Sun, Aug 11, 2013 at 12:56 PM, Michael Gogins |
Date | 2013-08-11 12:50 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I have some flags in Custom.cmake #### NOTE the processor type needs setting set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -W -Wall -O3 -Wno-missing-field-initializers -Wno-unused-parameter") include(CheckCCompilerFlag) check_c_compiler_flag(-ftree-vectorize HAS_TREE_VECTORISE) if (HAS_TREE_VECTORISE) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ftree-vectorize") endif() check_c_compiler_flag(-ffast-math HAS_FAST_MATH) if (HAS_FAST_MATH) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ffast-math") endif() On 11 Aug 2013, at 09:59, Steven Yi wrote: > Question: were you all setting flags manually in Custom.cmake, or > perhaps something else? Dr Victor Lazzarini Senior Lecturer Dept. of Music NUI Maynooth Ireland tel.: +353 1 708 3545 Victor dot Lazzarini AT nuim dot ie ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2013-08-11 13:03 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Hi Victor, Could you try removing those and using "-DCMAKE_BUILD_TYPE=Release"? This should now build with all of those flags and with -O3 correctly. Thanks! steven On Sun, Aug 11, 2013 at 1:50 PM, Victor Lazzarini |
Date | 2013-08-11 13:15 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
BTW: I thought it worth summarizing what's happened here: 1. I had Debug set in must Custom.cmake without realizing it. I had made modifications to compiler flags but in CMakeCache.txt. These were not actually then being used. Any of the reported differences then for -O2/-O3 were just aberrations due to run-to-run differences and not anything real. 2. Removing Debug and having the flags I entered by commandline honored, the -O3 build is now showing results similar to what was reported by Victor, Michael, and John. That is, between cs5 and cs6, the result is just slightly slower with ksmps=1, rather than the 200%+ I had been getting. 3. CS6 looks like there is still room for improvement, as what John found with recoding areas for sample accuracy and what I had seen when compiling debug. Good news though is that the sample accurate code is not that bad of an impact, and even better it has areas where it can be improved even further. 4. Our build files had some issues related to setting of compiler flags, as reported for Raspberry Pi. I think the updated Cmake files work better now and hopefully fix it; if there are still issues, at least it should be easier to modify and fix now. If I've missed anything, please add on here! Thanks! steven On Sun, Aug 11, 2013 at 2:03 PM, Steven Yi |
Date | 2013-08-11 13:17 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Oh, and thanks all for enduring this very long thread! It was certainly driving me a bit mad to try so many things and get such poor results, and I appreciate all the consideration everyone put into this. On Sun, Aug 11, 2013 at 2:15 PM, Steven Yi |
Date | 2013-08-11 13:22 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
I tried that, but somehow the performance is a little worse than before. Victor On 11 Aug 2013, at 13:03, Steven Yi wrote: > Hi Victor, > > Could you try removing those and using "-DCMAKE_BUILD_TYPE=Release"? > This should now build with all of those flags and with -O3 correctly. > > Thanks! > steven > > On Sun, Aug 11, 2013 at 1:50 PM, Victor Lazzarini > |
Date | 2013-08-11 13:26 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
That's odd as the flags should be the same either way. If you have verbose makefiles on, are you seeing different flags listed when compiling between the two methods? On Sun, Aug 11, 2013 at 2:22 PM, Victor Lazzarini |
Date | 2013-08-11 13:34 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
yes, I need to compare the two, but it's faster with my older settings (debug, plus optimisations) On 11 Aug 2013, at 13:26, Steven Yi wrote: > That's odd as the flags should be the same either way. If you have > verbose makefiles on, are you seeing different flags listed when > compiling between the two methods? > > On Sun, Aug 11, 2013 at 2:22 PM, Victor Lazzarini > |
Date | 2013-08-11 14:00 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
OK, here is a sample of the flags: My usual build (Debug, plus the optimisations), the fastest /Users/victor/bin/gcc -DCS_DEFAULT_PLUGINDIR=\"/Users/victor/Library/Frameworks/CsoundLib64.framework/Versions/6.0/Resources/Opcodes64\" -DCsoundLib64_EXPORTS -DHAVE_SOCKETS -DHAVE_STRTOD_L -DHAVE_STRTOK_R -DMACOSX -DNO_FLTK_THREADS -DPIPES -DUSE_DOUBLE -DUSE_LRINT -D_CSOUND_RELEASE_ -ftree-vectorize -ffast-math -fvisibility=hidden -mfpmath=sse -fomit-frame-pointer -W -Wall -O3 -Wno-missing-field-initializers -Wno-unused-parameter -ftree-vectorize -ffast-math -g -isysroot /Developer/SDKs/MacOSX10.6.sdk -fPIC -I/usr/local/include -I/Users/victor/src/csound6/./H -I/Users/victor/src/csound6/./include -I/Users/victor/src/csound6/./Engine -I/Users/victor/src/csound6/. -I/Users/victor/src/csound6/debug -Wno-format -D__BUILDING_LIBCSOUND -DPARCS -DHAVE_DIRENT_H -DHAVE_FCNTL_H -DHAVE_UNISTD_ H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -o CMakeFiles/CsoundLib64.dir/Opcodes/vbap_zak.c.o -c /Users/victor/src/csound6/Opcodes/vbap_zak.c Build with Release (plus optimisations, slightly slower /Users/victor/bin/gcc -DCS_DEFAULT_PLUGINDIR=\"/Users/victor/Library/Frameworks/CsoundLib64.framework/Versions/6.0/Resources/Opcodes64\" -DCsoundLib64_EXPORTS -DHAVE_SOCKETS -DHAVE_STRTOD_L -DHAVE_STRTOK_R -DMACOSX -DNO_FLTK_THREADS -DPIPES -DUSE_DOUBLE -DUSE_LRINT -D_CSOUND_RELEASE_ -ftree-vectorize -ffast-math -fvisibility=hidden -mfpmath=sse -fomit-frame-pointer -W -Wall -O3 -Wno-missing-field-initializers -Wno-unused-parameter -ftree-vectorize -ffast-math -O3 -DNDEBUG -isysroot /Developer/SDKs/MacOSX10.6.sdk -fPIC -I/usr/local/include -I/Users/victor/src/csound6/./H -I/Users/victor/src/csound6/./include -I/Users/victor/src/csound6/./Engine -I/Users/victor/src/csound6/. -I/Users/victor/src/csound6/debug -Wno-format -D__BUILDING_LIBCSOUND -DPARCS -DHAVE_DIRENT_H -DHAVE_FCNTL_H -DHA VE_UNISTD_H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -o CMakeFiles/CsoundLib64.dir/Opcodes/vbap_zak.c.o -c /Users/victor/src/csound6/Opcodes/vbap_zak.c The difference the presence of -g and absence of -DNDEBUG. I tested and -g does not change anything (it does not make it any faster), but -DNDEBUG seems to be the cause of the slowdown. Victor On 11 Aug 2013, at 13:34, Victor Lazzarini wrote: > yes, I need to compare the two, but it's faster with my older settings (debug, plus optimisations) > On 11 Aug 2013, at 13:26, Steven Yi wrote: > >> That's odd as the flags should be the same either way. If you have >> verbose makefiles on, are you seeing different flags listed when >> compiling between the two methods? >> >> On Sun, Aug 11, 2013 at 2:22 PM, Victor Lazzarini >> |
Date | 2013-08-11 14:11 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
yes by setting set(CMAKE_CXX_FLAGS_RELEASE "-O3 ") set(CMAKE_C_FLAGS_RELEASE "-O3 ") in my Custom.cmake, I can make my release build as fast as before. That overrides the settings that include -DNDEBUG Victor On 11 Aug 2013, at 14:00, Victor Lazzarini wrote: > OK, here is a sample of the flags: > > My usual build (Debug, plus the optimisations), the fastest > /Users/victor/bin/gcc -DCS_DEFAULT_PLUGINDIR=\"/Users/victor/Library/Frameworks/CsoundLib64.framework/Versions/6.0/Resources/Opcodes64\" -DCsoundLib64_EXPORTS -DHAVE_SOCKETS -DHAVE_STRTOD_L -DHAVE_STRTOK_R -DMACOSX -DNO_FLTK_THREADS -DPIPES -DUSE_DOUBLE -DUSE_LRINT -D_CSOUND_RELEASE_ -ftree-vectorize -ffast-math -fvisibility=hidden -mfpmath=sse -fomit-frame-pointer -W -Wall -O3 -Wno-missing-field-initializers -Wno-unused-parameter -ftree-vectorize -ffast-math -g -isysroot /Developer/SDKs/MacOSX10.6.sdk -fPIC -I/usr/local/include -I/Users/victor/src/csound6/./H -I/Users/victor/src/csound6/./include -I/Users/victor/src/csound6/./Engine -I/Users/victor/src/csound6/. -I/Users/victor/src/csound6/debug -Wno-format -D__BUILDING_LIBCSOUND -DPARCS -DHAVE_DIRENT_H -DHAVE_FCNTL_H -DHAVE_UNIST D_H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -o CMakeFiles/CsoundLib64.dir/Opcodes/vbap_zak.c.o -c /Users/victor/src/csound6/Opcodes/vbap_zak.c > > Build with Release (plus optimisations, slightly slower > /Users/victor/bin/gcc -DCS_DEFAULT_PLUGINDIR=\"/Users/victor/Library/Frameworks/CsoundLib64.framework/Versions/6.0/Resources/Opcodes64\" -DCsoundLib64_EXPORTS -DHAVE_SOCKETS -DHAVE_STRTOD_L -DHAVE_STRTOK_R -DMACOSX -DNO_FLTK_THREADS -DPIPES -DUSE_DOUBLE -DUSE_LRINT -D_CSOUND_RELEASE_ -ftree-vectorize -ffast-math -fvisibility=hidden -mfpmath=sse -fomit-frame-pointer -W -Wall -O3 -Wno-missing-field-initializers -Wno-unused-parameter -ftree-vectorize -ffast-math -O3 -DNDEBUG -isysroot /Developer/SDKs/MacOSX10.6.sdk -fPIC -I/usr/local/include -I/Users/victor/src/csound6/./H -I/Users/victor/src/csound6/./include -I/Users/victor/src/csound6/./Engine -I/Users/victor/src/csound6/. -I/Users/victor/src/csound6/debug -Wno-format -D__BUILDING_LIBCSOUND -DPARCS -DHAVE_DIRENT_H -DHAVE_FCNTL_H -D HAVE_UNISTD_H -DHAVE_STDINT_H -DHAVE_SYS_TIME_H -DHAVE_SYS_TYPES_H -DHAVE_TERMIOS_H -o CMakeFiles/CsoundLib64.dir/Opcodes/vbap_zak.c.o -c /Users/victor/src/csound6/Opcodes/vbap_zak.c > > > The difference the presence of -g and absence of -DNDEBUG. I tested and -g does not change anything (it does not make it any faster), but > -DNDEBUG seems to be the cause of the slowdown. > > Victor > On 11 Aug 2013, at 13:34, Victor Lazzarini wrote: > >> yes, I need to compare the two, but it's faster with my older settings (debug, plus optimisations) >> On 11 Aug 2013, at 13:26, Steven Yi wrote: >> >>> That's odd as the flags should be the same either way. If you have >>> verbose makefiles on, are you seeing different flags listed when >>> compiling between the two methods? >>> >>> On Sun, Aug 11, 2013 at 2:22 PM, Victor Lazzarini >>> |
Date | 2013-08-11 14:29 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Nice find! This seems highly odd though. I wasn't all that familiar with NDEBUG but it seems to be used to turn off all assertions and debug messages (I guess for various system libraries). That it's actually making performance worse seems odd. On Sun, Aug 11, 2013 at 3:11 PM, Victor Lazzarini |
Date | 2013-08-11 15:02 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Hi Victor, I took a look on the web and didn't find any concrete information. I tried it here and the results were a bit inconclusive (the times were varying quite a bit run-to-run). Maybe we should just add what you mentioned to the top of the main CMakeLists.txt. That way it'll be before the inclusion of Custom.cmake and people can override. steven On Sun, Aug 11, 2013 at 3:29 PM, Steven Yi |
Date | 2013-08-11 18:19 |
From | Victor Lazzarini |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
There is a definite (yet small) improvement, averaged over several runs. I wondered if that is my system, or if that is across the board. That's why I did not add to CMakeLists.txt. If you think I should, I'll do it. On 11 Aug 2013, at 15:02, Steven Yi wrote: > Hi Victor, > > I took a look on the web and didn't find any concrete information. I > tried it here and the results were a bit inconclusive (the times were > varying quite a bit run-to-run). Maybe we should just add what you > mentioned to the top of the main CMakeLists.txt. That way it'll be > before the inclusion of Custom.cmake and people can override. > > steven > > On Sun, Aug 11, 2013 at 3:29 PM, Steven Yi |
Date | 2013-08-11 20:24 |
From | Steven Yi |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Hi Victor, Just got back to the computer and saw that you committed a change. I think that it's fine to go with it. I seem to remember seeing other projects online that took it out too. If it ends up being an issue we can always change it back. Thanks! steven On Sun, Aug 11, 2013 at 7:19 PM, Victor Lazzarini |
Date | 2013-08-12 02:25 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Yesterday I got cs6 to run lesss instructions than cs5 for Steven's example. Just checked it into git -- sorty haad no net access yesreday ==John ff > There is a definite (yet small) improvement, averaged over several runs. I > wondered if that is my system, or if that is > across the board. That's why I did not add to CMakeLists.txt. If you think > I should, I'll do it. > On 11 Aug 2013, at 15:02, Steven Yi wrote: > >> Hi Victor, >> >> I took a look on the web and didn't find any concrete information. I >> tried it here and the results were a bit inconclusive (the times were >> varying quite a bit run-to-run). Maybe we should just add what you >> mentioned to the top of the main CMakeLists.txt. That way it'll be >> before the inclusion of Custom.cmake and people can override. >> >> steven >> >> On Sun, Aug 11, 2013 at 3:29 PM, Steven Yi |
Date | 2013-08-12 02:49 |
From | jpff@cs.bath.ac.uk |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
> Has anyone tried callgrind? Or gprof? > > only using callgrind ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Csound-devel mailing list Csound-devel@lists.sourceforge.net |
Date | 2013-08-12 12:14 |
From | Michael Gogins |
Subject | Re: [Cs-dev] Performance Issues with Csound6 |
Attachments | None None |
There is no significant change in performance of the 3 examples I provided in my earlier email, here, with the latest git develop built just now, on Windows.
Regards, Mike =========================== Michael GoginsIrreducible Productions http://michaelgogins.tumblr.com Michael dot Gogins at gmail dot com On Sun, Aug 11, 2013 at 9:25 PM, <jpff@cs.bath.ac.uk> wrote: Yesterday I got cs6 to run lesss instructions than cs5 for Steven's example. |