Out of memory dog breed training lesson 2

Hi,
I am using paperspace and I get an out of memory error when trying to replicate and run Dog Breed Classification Problem. See sample code below

def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms = transforms_side_on, max_zoom=1.1)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', test_name='test', 
                                    val_idxs = val_idxs, suffix = '.jpg', tfms = tfms, bs = bs)
    return data if sz > 300 else data.resize(340, 'tmp')

NOTE: I am using resnext50 as my architecture my batch size is 58
learn = ConvLearner.pretrained(arch, data, precompute=True, ps=0.5)
learn.fit(1e-2, 2) - I get an accuracy of 89% :frowning:
learn.precompute = False
learn.fit(1e-2, 5, cycle_len = 1) - I get an accuracy of just 90% :cry:

I tried to unfreeze the layers and do something to improve my code
learn.unfreeze()
lr=np.array([1e-4,1e-3,1e-2])
learn.fit(lr, 3, cycle_len=1)

When I run the above I get an out of memory error.

NOTE: Even if I replicate & run exact code a Jeremy provided I get an out of memory error

Regards
Murali

Did you try and restart the kernel?

You shouldn’t get memory errors. On my 1070, the dogbreeds nb never exceeds 2gb of memory occupation.

This has happened to me. I used a GTX 1060 6GB for the duration of the class, which always hit the limit when training with differential learning rates. Consistently. The way to fix this is to use a smaller batch size. I usually divide the batch size by two until it stops running out of memory. Dogbreeds stabilized at about bs=32.

No luck guys, I tried reducing the batch size, still have issues with out of memory error. :frowning:

did you follow all the steps for CUDA out of memory error?

how you check memory consumption during the training !?

use resnext101_64 instead to get higher accuracy
i have the same out of memory problem when training the earlier layers though

I did try all the options no luck :(, will see what else I can do to fix the out of memory issue.

I think the main cause of my out of memory is because of unfreezing the layers and applying different learning rates and this caused the out of memory exception, I have since stopped unfreezing the layers and training with 1e-2 and dont get out of memory error any more

1 Like

Sorry, I didn’t see your reply.

I’m on windows, so I can monitor memory occupation in realtime with many tools (gpu-z, hwmonitor, etc).

You can check it when on linux with nvidia-smi.