Lesson 1 In-Class Discussion ✅

Shubhajit · October 25, 2018, 7:31am

While running my notebook, I run into the following error:

`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(5)

RuntimeError: CUDA error: out of memory`

I am using n1-highmem-8 in GCP .
I have tried restarting the notebook kernel, and also restarted my gcp instance, still the problem exist.

insoluble · October 25, 2018, 7:33am

Hey, I’ve fastai version : '1.0.12', Pytorch version : '1.0.0.dev20181024'.

Running command
torch.backends.cudnn.enabled gives
True

nvcc -V gives
release 9.0, V9.0.176

torch.version.cuda gives
'9.2.148'

But but torch.cuda.is_available() gives False.

Is there any way to solve this?

ecdrid · October 25, 2018, 7:38am

You are out of your GPU memory…try monitoring the same via nvidia-smi or gpustat

You can install them yourself via a bit of googling if they weren’t installed which is unlikely

Try reducing bs then

Shubhajit · October 25, 2018, 7:40am

I know, but how do I fix this.
That’s why I have restarted my jupyter kernel, but it still exists

sgugger · October 25, 2018, 7:40am

Just a little correction: your timeline (thanks for this!) seems to indicate I wrote the 1cycle paper, which isn’t true. It’s Leslie Smith who did, I just wrote a blog post about it

lesscomfortable · October 25, 2018, 7:42am

Hi Shubhaijt. Please don’t tag Jeremy and Rachel unless other people in the forums cannot help you, see Etiquette for Posting to Forums.

The ‘highmem’ in your instance means you have high RAM Memory but the error you are getting refers to GPU memory, these are not related. Have you tried decreasing the batch size to 32?

ecdrid · October 25, 2018, 7:43am

Attach the output of your gpu monitoring screen , then someone can surely help!

And don’t post the whole traceback as such without using ```

(The last line is sufficient in this case imho to debug)

Shubhajit · October 25, 2018, 7:46am

Thanks for informing, @lesscomfortable .
Okay, I will try decreasing batch-size to 32.
But, as this GCP instance is the recomended one, so I think the problem must not exist!!

MaheshKhatri · October 25, 2018, 7:49am

In the 1st video of earlier version of the FastAI course, I could be wrong but I do remember Jeremy saying that for images of real world objects (that are used in Imagenet); the model would do well even on any set of images chosen by the participants as long as they were of day to day real world objects (like the ones in Imagenet).

Also, in the Galaxy Zoo link mentioned by you, the author specifically says " Transfer learning by pre-training a deep neural network on another dataset (say, ImageNet), chopping off the top layer and then training a new classifier, a popular approach for the recently finished Dogs vs. Cats competition, is not really viable either".

I hope this helps. My 2p.

pierreguillou · October 25, 2018, 7:49am

Thanks @sgugger for your message (and for your article !). I edited my timeline with the following text :

(explication of the Leslie Smith paper in the article of @sgugger : The 1cycle policy)

Shubhajit · October 25, 2018, 7:49am

Thanks @ecdrid

lesscomfortable · October 25, 2018, 7:57am

It is good that you need to change the batch size according to your GPU’s capacity. Together with lr_finder it is something we need to do everytime we approach a new problem and it is also an opportunity to get familiar with the ImageDataBunch class.

insoluble · October 25, 2018, 8:09am

I wonder why nvcc -V gives 9.0 even after successful run of conda install -c pytorch -c fastai fastai pytorch-nightly cuda92.

followben · October 25, 2018, 8:54am

I guess it makes sense, but I’m just trying to confirm I haven’t done anything wrong and to get a better intuition of why the results are worse.

Is it because ImageNet doesn’t have the right nodes to reliably activate for features that make a difference in categorising images of things it hasn’t seen before (e.g. bands of stars) vs things it has (e.g. facial features, geometric shapes etc)? Would it be better to start with a different model? A larger sample/ data set?

isarth · October 25, 2018, 8:57am

The last epoch weights !

MaheshKhatri · October 25, 2018, 9:36am

Thanks.

I would suggest that you wait till Jeremy reverts back on your notebook shared with him.

Is it because ImageNet doesn’t have the right nodes to reliably activate for features that make a difference in categorising images of things it hasn’t seen before (e.g. bands of stars) vs things it has (e.g. facial features, geometric shapes etc)?

Perhaps. Maybe in the initial lower layers, some of the learned features could be common between galaxies and real world objects.

But in the higher composed layers, the galaxy specific features may have not been learnt.

Would it be better to start with a different model?

Maybe, yes.

A larger sample/ data set?

Maybe, not.

But I would still suggest to wait till Jeremy’s reply.

MaheshKhatri · October 25, 2018, 9:50am

Also, in the 2nd video of the previous course at around 43:49, Jeremy had stated the following -

* Images like satellite images, CT scans, etc have totally different kinds of features all together (compare to ImageNet images), so you want to re-train many layers.

* For dogs and cats, images are similar to what the model was pre-trained with, but we still may find it is helpful to slightly tune some of the later layers.

The link for the above is the excellent @hiromi resource at https://medium.com/@hiromi_suenaga/deep-learning-2-part-1-lesson-2-eeae2edd2be4

I hope this helps.

ymittal23 · October 25, 2018, 10:12am

@lesscomfortable how to change batch size there?

Shubhajit · October 25, 2018, 10:17am

Check in Training: resnet50
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)
You can reduce the batch size for e.g. bs = 32

ymittal23 · October 25, 2018, 10:31am

Reduce the bs=2 then also same error