Running out of memory on big image sets

nextM · March 23, 2018, 8:08am

Hi,

This is the source code, the initial training of the FC layer before unfreezing

learn = ConvLearner.pretrained(f_model, data, ps=ps, xtra_fc=xtra_fc,
        precompute=False, metrics=metrics)
learn.fit(lr, 3, cycle_len=1)

Key to this is the data loader:

return ImageClassifierData.from_csv(path, f'{train_class}/train', labels_file_multi_dev, bs, tfms,
                                        suffix='', val_idxs=val_idxs,
                                        test_name=f'{train_class}/test', num_workers=30)

The initial augmentation starts, then data starts to be fed to the GPU. However, as there is more augmentation always occuring, the memory usage jumps up, around 500MB every few seconds

This time around, I can’t reproduce it. 2 differences:

CPU utilisation never went above 800%, before it was at 2000%
Memory usage maxed out at 50GB

It makes sense this process would use a lot of RAM, however I would think it could release some as these are random image being generated (unless they are being cached so they can be resent to the model?). So it would be great to try figure this out and get the RAM usage down (for example if you had 2 GPUs on your own box with 64GB RAM, this would prevent the 2nd GPU being used)