We throw some light on how program¬ming is done for CUDA. It extends C by allowing programmers to_define C functions known as 'kernels'. When these kernels are called, they execute n times (in parallel) in n different threads. Here is the code snippet to define a kernel:
Here kernel is defined using '_global_', and number of threats are define inside a new syntax «< ... »>. Each of the thread that executes a kernel is given a unique thread'ID that is ac¬cessible within kernel through a built in variable' threadIdx' variable. 'threadIdx' is a 3-component vector, therefore it can be identified using one-dimensional, two-dimensional or three-dimensional index formirfg one/two/three dimensional thread blocks.
While executing, threads can access memory from three different places: private memory of thread, block memory for all threads presert in block and global mernory.A lot of examples are present in 'C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects.' Compile the e examples and run them. One can also' customize these projects. Before writing codes programmers should analyse their code so that they can create small chunks of data that can be distributed intu threads. Also keep in mind that you create sufficient number of threads to optimally utilize GPU power.
NVIDIA is not the only vendor to provide a programming interface to harness the parallel processing power of a GPu. AT! has also joined them with the release of 'AT! Stream Technology' that runs on AT! graphics cards. We shall be providing more information on this in the near future. So, watch out this space in the coming issues