| Thanks. These instructions are already present in the disassembled CsoundLib64, so I expect that your code will use the optimisations (I have not built it yet).
========================
Prof. Victor Lazzarini
Maynooth University
Ireland
> On 11 Jul 2021, at 21:54, Eduardo Moguillansky wrote:
>
> I took a brief look but I didn't see clear candidates for this in aops.c. I did apply this to the "sum" opcode, which resulted in the same kind of speed up when summing 4 or more signals.
>
> As a side note, to check if the optimizations are taking effect in macos, the disassembled code should have instructions like "movupd" and "addpd".
>
> On 11.07.21 13:10, Victor Lazzarini wrote:
>> Thanks for this. I think I have the right flags for SIMD on my MacOS build that I use for releases (I’ve played with these a few years back), but
>> I will check.
>>
>> Not sure if there’s any gain to be had, but if you have the chance, could you look at the output mixing functions (aops.c) to see if your
>> solutions to arrays can also be applied there.
>>
>> best
>> ========================
>> Prof. Victor Lazzarini
>> Maynooth University
>> Ireland
>>
>>> On 11 Jul 2021, at 11:55, Eduardo Moguillansky wrote:
>>>
>>> *Warning*
>>>
>>> This email originated from outside of Maynooth University's Mail System. Do not reply, click links or open attachments unless you recognise the sender and know the content is safe.
>>>
>>> Hi,
>>>
>>> I just sent a PR which makes sumarray for audio arrays friendlier to
>>> simd compilation. It copies supercollider's approach, where audio
>>> signals are summed in groups of 4. With properly configured flags this
>>> results in very efficient horizontal simd instructions. On my system
>>> (linux, sandy bridge), with a properly configured cmake I see execution
>>> speed up to 9x faster, which is a great result for only a very small
>>> modification.
>>>
>>> cheers,
>>>
>>> eduardo
|