Using GPU for the ML1 course

(Adrien Lemaire) #1

I have setup my environment on a GCP gpu instance, and working my way through the ML course. Only using CPU when I’m paying for the gpu is annoying. Some preprocessing like applying pd.Series to json fields and joining them back to the dataset is also very slow using cpu.

Is it possible to run the ML notebooks on GPU ? if yes, what setting do I need to tweak ?

If not possible, can we at least use multiple cores ? I can see that only 1 of my 4 cores is spinning at 100%.

Thanks in advance !

(Will) #2

Jeremy progressively walks you through how to speed things up as you go through the course, so just have patience. If you don’t have patience, checkout set_rf_samples(). There’s honestly no reason to go through the trouble of setting up a GPU and it in most cases would probably be slower than the CPU. When training random forests or most other sklearn models you can pass n_jobs=-1 to use all available cores if that helps your need for speed at all :slight_smile:

(Adrien Lemaire) #3

I see :slight_smile: I’ll study the code to understand how n_jobs=-1 is being used, see if I can reuse it in the slowest parts of my notebook like:


Thanks @whamp

(Will) #4

sorry i don’t think I was clear enough. n_jobs= -1 is a parameter for training sklearn models. It isn’t applicable to pandas join methods.

If something is taking a really long time in your code, i would probably try to rethink how you’re approaching the problem. It’s likely you’re not taking advantage of the vectorization built into numpy or pandas. One library Jeremy covers that’s helpful for speeding up pure python functions is Cython, so that’s worth checking out.

All that being said, there’s no need to optimize for gpu or rewrite code in cython for the ML1 course. Just watch the videos and follow along in detail. I assure you that computer performance won’t be your limiting factor, it’ll be your brains capacity to absorb!