It’s been a while since I was able to have a look at processing using CUDA, and I’ve forgotten everything I learned. I went a-Googling again for information and this time I came across a wonderful course on Udacity called Intro to Parallel Programming. It uses CUDA, and is the best explanation I’ve seen on the topic, anywhere.
Also it’s free, so what do you have to lose? With CUDA 6 adding unified memory support, there hasn’t been a better time to jump in.