[Csnd-dev] SIMD use in Csound

Date	2017-02-10 20:48
From	jpff
Subject	[Csnd-dev] SIMD use in Csound
	Following up on the discussion about speeding up out* opcodes Victor and I have been looking into use of SIMD (SSE) components of the hardware. I have 16byte alignment for a-vars working and for spout etc and it seems to run OK; this is a prerequisite for some SSE instructions. Only looked at doubles whee SSE goes two-by-two and with arithmetic for a-var there is no speed advantage. The same is true for out. This seems to be because the gcc and clang compilers can detect these simple cases and auto-vectorise -- I have been looking at the generated assembler. > It was suggested that the interleaving code might be attacked but it is intrinsically non vectorisable in SSE style as far as I can understand () Were should we go from here? Should I just revert the alignment changes? Or does someone have a suggestion? ==John ffitch

Date	2017-02-10 20:54
From	Victor Lazzarini
Subject	Re: [Csnd-dev] SIMD use in Csound
	I have a code fragment for interleaved code that might (or might not) speed up. This is how it goes __m128d spvec, invec; double tmp = (double ) &invec; for(i=j=0; i < nsmps; i+=2, j++){ spvec = (__m128d ) &spout[i]; tmp[0] = in1[j]; tmp[1] = in2[j]; spvec = _mm_add_pd(spvec, invec); } it’s sharing pointers and needs aligned memory. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 10 Feb 2017, at 20:48, jpff wrote: > > Following up on the discussion about speeding up out opcodes Victor > and I have been looking into use of SIMD (SSE) components of the > hardware. > > I have 16byte alignment for a-vars working and for spout etc and it > seems to run OK; this is a prerequisite for some SSE instructions. > > Only looked at doubles whee SSE goes two-by-two and with arithmetic for > a-var there is no speed advantage. The same is true for out. > This seems to be because the gcc and clang compilers can detect these > simple cases and auto-vectorise -- I have been looking at the generated > assembler. >> > It was suggested that the interleaving code might be attacked but it is > intrinsically non vectorisable in SSE style as far as I can understand > () > > Were should we go from here? Should I just revert the alignment changes? > Or does someone have a suggestion? > ==John ffitch > > (*) not done this since circa 1980 so a little ru

Date	2017-02-12 17:04
From	jpff
Subject	Re: [Csnd-dev] SIMD use in Csound
	But the interleave code has no add just copy so this is not the way to go. On Fri, 10 Feb 2017, Victor Lazzarini wrote: > I have a code fragment for interleaved code that might (or might not) speed up. > This is how it goes > > __m128d spvec, invec; > double tmp = (double ) &invec; > for(i=j=0; i < nsmps; i+=2, j++){ > spvec = (__m128d ) &spout[i]; > tmp[0] = in1[j]; > tmp[1] = in2[j]; > spvec = _mm_add_pd(*spvec, invec); > } > > it’s sharing pointers and needs aligned memory. > ========================

Date	2017-02-12 20:14
From	Victor Lazzarini
Subject	Re: [Csnd-dev] SIMD use in Csound
	I forgot that interleave is not adding anymore. Probably no gain then. ======================== Prof. Victor Lazzarini Dean of Arts, Celtic Studies, and Philosophy, Maynooth University, Maynooth, Co Kildare, Ireland Tel: 00 353 7086936 Fax: 00 353 1 7086952 > On 12 Feb 2017, at 17:04, jpff wrote: > > But the interleave code has no add just copy so this is not the way to go. > > On Fri, 10 Feb 2017, Victor Lazzarini wrote: > >> I have a code fragment for interleaved code that might (or might not) speed up. >> This is how it goes >> >> __m128d spvec, invec; >> double tmp = (double ) &invec; >> for(i=j=0; i < nsmps; i+=2, j++){ >> spvec = (__m128d ) &spout[i]; >> tmp[0] = in1[j]; >> tmp[1] = in2[j]; >> spvec = _mm_add_pd(*spvec, invec); >> } >> >> it’s sharing pointers and needs ali