Csound Csound-dev Csound-tekno Search About

Re: [Csnd] csound & cuda

Date2014-01-16 13:16
FromMjk From Lenovo
SubjectRe: [Csnd] csound & cuda
Hi,
i
I got the cuda .so build on linux. It needs just renaming the dylib to so and adding -Xcompiler -fPIC, and -L to the correct cuda lib.

The pvs-csd runs correctly after  renaming the soundfiles to real files I have. No errors, but output is only nans.

But pretty quick, although I tested with a low-end GT420 on an old linux computer.

But I faced anothet problem with portaudio. The old box has ubuntu 12.04 and portaudio is too old. I installed the latest, and it's probably too new. Compilation goes without errors, but a warning  undefined symbol paHostApiInitializers is issued when running csound.  What's the correct version of pa? It's always 19, only the date tells what's the true version.

Tnx

-m

Date2014-01-16 13:40
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
Why don’t you just use the also or jack output modules (-+rtaudio=alsa or -+rtaudio=jack).

Regarding NaNs, yes, I got those from time to time, it appears something goes funny in the GPU, but
I need to do more tests. Otherwise, things run smoothly.

You need to check also what the compute capabilities/arch are for your device, because that will
also be important to set correctly when compiling (I use sm_30).

The code also has been hardwired to make blocks of up to 1024 threds, so if there are less cores
available, the max block size needs to be smaller. That could be a reason for NaNs.

Victor

On 16 Jan 2014, at 13:16, Mjk From Lenovo  wrote:

> Hi,
> i
> I got the cuda .so build on linux. It needs just renaming the dylib to so and adding -Xcompiler -fPIC, and -L to the correct cuda lib.
> 
> The pvs-csd runs correctly after  renaming the soundfiles to real files I have. No errors, but output is only nans.
> 
> But pretty quick, although I tested with a low-end GT420 on an old linux computer.
> 
> But I faced anothet problem with portaudio. The old box has ubuntu 12.04 and portaudio is too old. I installed the latest, and it's probably too new. Compilation goes without errors, but a warning  undefined symbol paHostApiInitializers is issued when running csound.  What's the correct version of pa? It's always 19, only the date tells what's the true version.
> 
> Tnx
> 
> -m
> Send bugs reports to the Sourceforge bug trackers
> csound6:
>            https://sourceforge.net/p/csound/tickets/
> csound5:
>            https://sourceforge.net/p/csound/bugs/
> Discussions of bugs and features can be posted here
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 
> 



Date2014-01-16 13:49
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
I’ve updated the code to have a THREADS_PER_BLOCK constant that can be set to the available
max number of threads. I’ll look into making this automatic when compiling.

On 16 Jan 2014, at 13:40, Victor Lazzarini  wrote:

> Why don’t you just use the also or jack output modules (-+rtaudio=alsa or -+rtaudio=jack).
> 
> Regarding NaNs, yes, I got those from time to time, it appears something goes funny in the GPU, but
> I need to do more tests. Otherwise, things run smoothly.
> 
> You need to check also what the compute capabilities/arch are for your device, because that will
> also be important to set correctly when compiling (I use sm_30).
> 
> The code also has been hardwired to make blocks of up to 1024 threds, so if there are less cores
> available, the max block size needs to be smaller. That could be a reason for NaNs.
> 
> Victor
> 
> On 16 Jan 2014, at 13:16, Mjk From Lenovo  wrote:
> 
>> Hi,
>> i
>> I got the cuda .so build on linux. It needs just renaming the dylib to so and adding -Xcompiler -fPIC, and -L to the correct cuda lib.
>> 
>> The pvs-csd runs correctly after  renaming the soundfiles to real files I have. No errors, but output is only nans.
>> 
>> But pretty quick, although I tested with a low-end GT420 on an old linux computer.
>> 
>> But I faced anothet problem with portaudio. The old box has ubuntu 12.04 and portaudio is too old. I installed the latest, and it's probably too new. Compilation goes without errors, but a warning  undefined symbol paHostApiInitializers is issued when running csound.  What's the correct version of pa? It's always 19, only the date tells what's the true version.
>> 
>> Tnx
>> 
>> -m
>> Send bugs reports to the Sourceforge bug trackers
>> csound6:
>>           https://sourceforge.net/p/csound/tickets/
>> csound5:
>>           https://sourceforge.net/p/csound/bugs/
>> Discussions of bugs and features can be posted here
>> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
>> 
>> 
> 
> 
> 
> Send bugs reports to the Sourceforge bug trackers
> csound6:
>            https://sourceforge.net/p/csound/tickets/
> csound5:
>            https://sourceforge.net/p/csound/bugs/
> Discussions of bugs and features can be posted here
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 
> 



Date2014-01-17 14:01
FromMatti Koskinen
SubjectRe: [Csnd] csound & cuda
On 01/16/2014 03:49 PM, Victor Lazzarini wrote:
> I’ve updated the code to have a THREADS_PER_BLOCK constant that can be set to the available
> max number of threads. I’ll look into making this automatic when compiling.
>
I cloned git, and compiled the cuda-ops again, and no NaNs. I tried with 
a long soundfile, changed the i1 to i1 0 15, and i2 15 5, but get only 
5sec output-file, and it's all silence.

Tried to get cuda for ubuntu 12.10 on 13.04, but no go. gcc is of 
different version, and cuda refuses to install. I remember long ago I 
faced this same problem, and tried all the hacks could find. Only thing 
would be to compile a suitable gcc, the .debs won't install.

I had a glimpse on OpenCL, would be more practical, as AMD and NVIDIA 
supports it, but there are no direct ffts, it's more computer vision 
oriented. Both AMD and NVIDIA have Blas and FFT, but I have just used 
them from OpenCV, without knowing how they are coded into OpenCV. But  
maybe sometime ...

tnx

-m



Date2014-01-17 14:37
FromEdward Costello
SubjectRe: [Csnd] csound & cuda
AttachmentsNone  None  

Date2014-01-17 15:40
FromMatti Koskinen
SubjectRe: [Csnd] csound & cuda
On 01/17/2014 04:37 PM, Edward Costello wrote:
> The OpenCL API is pretty minimal, I haven’t come across any decent 
> libraries for it either. Apple have an FFT implementation 
> here https://developer.apple.com/library/mac/samplecode/OpenCL_FFT/Introduction/Intro.html. 
>
> Also, although there are image primitives in it, it works fine for 
> audio. The language is pretty much C99, with some extra things for 
> different memory stores but it’s not hard to learn.
>
> -- 
> Edward Costello
>
thanks, I'll try it.

-m



Date2014-01-17 15:54
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
Note that you need to set the the threads per block manually at the moment. 
Did you do this?

My impression is that CUDA is a very good option for GPU programming. I got results very quickly with it.
Not only it has its FFTs, but also seems to be well tuned to the NVIDIA hardware.

Victor

On 17 Jan 2014, at 14:37, Edward Costello  wrote:

> The OpenCL API is pretty minimal, I haven’t come across any decent libraries for it either. Apple have an FFT implementation here https://developer.apple.com/library/mac/samplecode/OpenCL_FFT/Introduction/Intro.html. 
> Also, although there are image primitives in it, it works fine for audio. The language is pretty much C99, with some extra things for different memory stores but it’s not hard to learn.
> 
> -- 
> Edward Costello
> 
> On Friday 17 January 2014 at 14:01, Matti Koskinen wrote:
> 
>> 
>> On 01/16/2014 03:49 PM, Victor Lazzarini wrote:
>>> I’ve updated the code to have a THREADS_PER_BLOCK constant that can be set to the available
>>> max number of threads. I’ll look into making this automatic when compiling.
>> I cloned git, and compiled the cuda-ops again, and no NaNs. I tried with
>> a long soundfile, changed the i1 to i1 0 15, and i2 15 5, but get only
>> 5sec output-file, and it's all silence.
>> 
>> Tried to get cuda for ubuntu 12.10 on 13.04, but no go. gcc is of
>> different version, and cuda refuses to install. I remember long ago I
>> faced this same problem, and tried all the hacks could find. Only thing
>> would be to compile a suitable gcc, the .debs won't install.
>> 
>> I had a glimpse on OpenCL, would be more practical, as AMD and NVIDIA
>> supports it, but there are no direct ffts, it's more computer vision
>> oriented. Both AMD and NVIDIA have Blas and FFT, but I have just used
>> them from OpenCV, without knowing how they are coded into OpenCV. But
>> maybe sometime ...
>> 
>> tnx
>> 
>> -m
>> 
>> 
>> 
>> Send bugs reports to the Sourceforge bug trackers
>> csound6:
>> https://sourceforge.net/p/csound/tickets/
>> csound5:
>> https://sourceforge.net/p/csound/bugs/
>> Discussions of bugs and features can be posted here
>> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 



Date2014-01-17 16:38
FromMatti Koskinen
SubjectRe: [Csnd] csound & cuda
On 01/17/2014 05:54 PM, Victor Lazzarini wrote:
> Note that you need to set the the threads per block manually at the moment.
> Did you do this?
No, looking at the code, there's threadIdx.x and blocks. Blocks won't 
exceed 1024, but where is threadIdx and how to adjust threads/block?
>
> My impression is that CUDA is a very good option for GPU programming. I got results very quickly with it.
> Not only it has its FFTs, but also seems to be well tuned to the NVIDIA hardware.
yes it is, the problem with it is, that cuda is so dependent of the 
c-compiler. gcc4.7 can't compile the cuda for ubuntu 12.10, there are 
some incompatibilities giving errors. Also on win7, it really would need 
a full VS, Express can be used to compile the code after some hacking 
(vcvars64.bat). AMD code can be compiled with mingw. There have been 
many requests to NVIDIA to change this. Shouldn't even be hard, OSX and 
Linux use gcc, but probably they have some secret agreement with M$.

CUDA is used largely with supercomputing. I think anyway one Chinese 
supercomputer uses really many NVIDIA gpus for number-crunching.

AMD problem is buggy drivers. Using HD7750 with Photoshop caused usually 
BSOD when clicking Print. I changed the NVIDIA card back, and not a 
single BSOD ever since.

-m



Date2014-01-17 16:55
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
THREADS_PER_BLOCK needs to be set to the max threads per block in your device.

Victor
On 17 Jan 2014, at 16:38, Matti Koskinen  wrote:

> 
> On 01/17/2014 05:54 PM, Victor Lazzarini wrote:
>> Note that you need to set the the threads per block manually at the moment.
>> Did you do this?
> No, looking at the code, there's threadIdx.x and blocks. Blocks won't exceed 1024, but where is threadIdx and how to adjust threads/block?
>> 
>> My impression is that CUDA is a very good option for GPU programming. I got results very quickly with it.
>> Not only it has its FFTs, but also seems to be well tuned to the NVIDIA hardware.
> yes it is, the problem with it is, that cuda is so dependent of the c-compiler. gcc4.7 can't compile the cuda for ubuntu 12.10, there are some incompatibilities giving errors. Also on win7, it really would need a full VS, Express can be used to compile the code after some hacking (vcvars64.bat). AMD code can be compiled with mingw. There have been many requests to NVIDIA to change this. Shouldn't even be hard, OSX and Linux use gcc, but probably they have some secret agreement with M$.
> 
> CUDA is used largely with supercomputing. I think anyway one Chinese supercomputer uses really many NVIDIA gpus for number-crunching.
> 
> AMD problem is buggy drivers. Using HD7750 with Photoshop caused usually BSOD when clicking Print. I changed the NVIDIA card back, and not a single BSOD ever since.
> 
> -m
> 
> 
> 
> 
> Send bugs reports to the Sourceforge bug trackers
> csound6:
>           https://sourceforge.net/p/csound/tickets/
> csound5:
>           https://sourceforge.net/p/csound/bugs/
> Discussions of bugs and features can be posted here
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 
> 



Date2014-01-17 17:15
FromMatti Koskinen
SubjectRe: [Csnd] csound & cuda
On 01/17/2014 06:55 PM, Victor Lazzarini wrote:
> THREADS_PER_BLOCK needs to be set to the max threads per block in your device.
>
> Victor
>
Sorry, me again. Where's this defined? I found that pvsops.cu print 
threads, blocks, where N is fftsize and N/1024.

Csound prints amps and samps out of range and time ok. Output is just 
silence and only 5sec long, although it should be 20sec.

-m


Date2014-01-17 17:33
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
right at the top of the *.cu files.

Victor
On 17 Jan 2014, at 17:15, Matti Koskinen  wrote:

> 
> On 01/17/2014 06:55 PM, Victor Lazzarini wrote:
>> THREADS_PER_BLOCK needs to be set to the max threads per block in your device.
>> 
>> Victor
>> 
> Sorry, me again. Where's this defined? I found that pvsops.cu print threads, blocks, where N is fftsize and N/1024.
> 
> Csound prints amps and samps out of range and time ok. Output is just silence and only 5sec long, although it should be 20sec.
> 
> -m
> 
> 
> 
> Send bugs reports to the Sourceforge bug trackers
> csound6:
>           https://sourceforge.net/p/csound/tickets/
> csound5:
>           https://sourceforge.net/p/csound/bugs/
> Discussions of bugs and features can be posted here
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 
> 



Date2014-01-17 17:51
FromMatti Koskinen
SubjectRe: [Csnd] csound & cuda
On 01/17/2014 07:33 PM, Victor Lazzarini wrote:
> right at the top of the *.cu files.
>
mine look like this:

// -*- c++ -*-
// pvsops.cu
// experimental cuda opcodes
//
// V Lazzarini, 2013

#include 
#include 
#define VSAMPS 16
#define MAXBLOCK 1024 //was 8912
#include 

/* kernel to convert from pvs to rectangular frame */
__global__ void frompvs(float* inframe, double* lastph, int blocks,

-m



Date2014-01-17 19:22
FromVictor Lazzarini
SubjectRe: [Csnd] csound & cuda
So you don’t have the latest version. It’s in the develop branch
(https://sourceforge.net/p/csound/csound6-git/ci/develop/tree/Opcodes/cuda/pvsops.cu)
Don’t change MAXBLOCK, change the threads constant (if you need to) to match the max number of threads possible in your device.

On 17 Jan 2014, at 17:51, Matti Koskinen  wrote:

> 
> On 01/17/2014 07:33 PM, Victor Lazzarini wrote:
>> right at the top of the *.cu files.
>> 
> mine look like this:
> 
> // -*- c++ -*-
> // pvsops.cu
> // experimental cuda opcodes
> //
> // V Lazzarini, 2013
> 
> #include 
> #include 
> #define VSAMPS 16
> #define MAXBLOCK 1024 //was 8912
> #include 
> 
> /* kernel to convert from pvs to rectangular frame */
> __global__ void frompvs(float* inframe, double* lastph, int blocks,
> 
> -m
> 
> 
> 
> 
> Send bugs reports to the Sourceforge bug trackers
> csound6:
>           https://sourceforge.net/p/csound/tickets/
> csound5:
>           https://sourceforge.net/p/csound/bugs/
> Discussions of bugs and features can be posted here
> To unsubscribe, send email sympa@lists.bath.ac.uk with body "unsubscribe csound"
> 
>