Csound Csound-dev Csound-tekno Search About

[Csnd-dev] SIMD use in Csound

Date2017-02-10 20:48
Fromjpff
Subject[Csnd-dev] SIMD use in Csound
Following up on the discussion about speeding up out* opcodes Victor
and I have been looking into use of SIMD (SSE) components of the
hardware.

I have 16byte alignment for a-vars working and for spout etc and it
seems to run OK; this is a prerequisite for some SSE instructions.

Only looked at doubles whee SSE goes two-by-two and with arithmetic for
a-var there is no speed advantage.  The same is true for out*.
This seems to be because the gcc and clang compilers can detect these
simple cases and auto-vectorise -- I have been looking at the generated
assembler. 
>
It was suggested that the interleaving code might be attacked but it is
intrinsically non vectorisable in SSE style as far as I can understand
(*)

Were should we go from here?  Should I just revert the alignment changes?
Or does someone have a suggestion?
==John ffitch

Date2017-02-10 20:54
FromVictor Lazzarini
SubjectRe: [Csnd-dev] SIMD use in Csound
I have a code fragment for interleaved code that might (or might not) speed up.
This is how it goes

 __m128d *spvec, invec;
 double tmp = (double *) &invec;
 for(i=j=0; i < nsmps; i+=2, j++){
   spvec = (__m128d *) &spout[i];
   tmp[0] = in1[j]; 
   tmp[1] = in2[j];
  *spvec = _mm_add_pd(*spvec, invec);
  }

it’s sharing pointers and needs aligned memory.
========================
Prof. Victor Lazzarini
Dean of Arts, Celtic Studies, and Philosophy,
Maynooth University,
Maynooth, Co Kildare, Ireland
Tel: 00 353 7086936
Fax: 00 353 1 7086952 

> On 10 Feb 2017, at 20:48, jpff  wrote:
> 
> Following up on the discussion about speeding up out* opcodes Victor
> and I have been looking into use of SIMD (SSE) components of the
> hardware.
> 
> I have 16byte alignment for a-vars working and for spout etc and it
> seems to run OK; this is a prerequisite for some SSE instructions.
> 
> Only looked at doubles whee SSE goes two-by-two and with arithmetic for
> a-var there is no speed advantage.  The same is true for out*.
> This seems to be because the gcc and clang compilers can detect these
> simple cases and auto-vectorise -- I have been looking at the generated
> assembler. 
>> 
> It was suggested that the interleaving code might be attacked but it is
> intrinsically non vectorisable in SSE style as far as I can understand
> (*)
> 
> Were should we go from here?  Should I just revert the alignment changes?
> Or does someone have a suggestion?
> ==John ffitch
> 
> (*) not done this since circa 1980 so a little ru

Date2017-02-12 17:04
Fromjpff
SubjectRe: [Csnd-dev] SIMD use in Csound
But the interleave code has no add just copy so this is not the way to go.

On Fri, 10 Feb 2017, Victor Lazzarini wrote:

> I have a code fragment for interleaved code that might (or might not) speed up.
> This is how it goes
>
> __m128d *spvec, invec;
> double tmp = (double *) &invec;
> for(i=j=0; i < nsmps; i+=2, j++){
>   spvec = (__m128d *) &spout[i];
>   tmp[0] = in1[j];
>   tmp[1] = in2[j];
>  *spvec = _mm_add_pd(*spvec, invec);
>  }
>
> it’s sharing pointers and needs aligned memory.
> ========================

Date2017-02-12 20:14
FromVictor Lazzarini
SubjectRe: [Csnd-dev] SIMD use in Csound
I forgot that interleave is not adding anymore. Probably no gain then.
========================
Prof. Victor Lazzarini
Dean of Arts, Celtic Studies, and Philosophy,
Maynooth University,
Maynooth, Co Kildare, Ireland
Tel: 00 353 7086936
Fax: 00 353 1 7086952 

> On 12 Feb 2017, at 17:04, jpff  wrote:
> 
> But the interleave code has no add just copy so this is not the way to go.
> 
> On Fri, 10 Feb 2017, Victor Lazzarini wrote:
> 
>> I have a code fragment for interleaved code that might (or might not) speed up.
>> This is how it goes
>> 
>> __m128d *spvec, invec;
>> double tmp = (double *) &invec;
>> for(i=j=0; i < nsmps; i+=2, j++){
>>  spvec = (__m128d *) &spout[i];
>>  tmp[0] = in1[j];
>>  tmp[1] = in2[j];
>> *spvec = _mm_add_pd(*spvec, invec);
>> }
>> 
>> it’s sharing pointers and needs ali