GPU Memory problem unet_learner

ChristophNeuner · March 15, 2019, 2:02pm

Hi,

I have a huge problem with gpu memory overflows when using the unet_learner.
I use a Titan XP or a GTX 1080Ti with 12GB of GPU-RAM and with a batch size of only one, an image size of 256 and a resnet50 the GPU memory overflows.
Does someone else experience similar problems?
With Keras and Tensorflow I do not experience such a problem with image sizes of 512, a batch size of 8 and larger models.

Thanks in advance!

Christoph

Here is my code:
The dataset is from the Data Science Bowl 2018 on Kaggle

def get_y_fn(x): return PATH/MASKS/os.path.split(x)[-1]

tfms = ([crop(size=sz),
        RandTransform(tfm=TfmAffine (dihedral_affine), kwargs={}, p=1.0, resolved={}, do_run=True, is_random=True)],
        #RandTransform(tfm=TfmLighting (brightness), kwargs={'change': (0.475, 0.525)}, p=0.75, resolved={}, do_run=True, is_random=True),
        #RandTransform(tfm=TfmLighting (contrast), kwargs={'scale': (0.95, 1.0526315789473684)}, p=0.75, resolved={}, do_run=True, is_random=True)],
      #RandTransform(tfm=TfmCoord (symmetric_warp), kwargs={'magnitude': (-0.2, 0.2)}, p=0.75, resolved={}, do_run=True, is_random=True),
      #RandTransform(tfm=TfmAffine (rotate), kwargs={'degrees': (-10.0, 10.0)}, p=0.75, resolved={}, do_run=True, is_random=True),
      #RandTransform(tfm=TfmLighting (brightness), kwargs={'change': (0.4, 0.6)}, p=0.75, resolved={}, do_run=True, is_random=True),
      #RandTransform(tfm=TfmLighting (contrast), kwargs={'scale': (0.8, 1.25)}, p=0.75, resolved={}, do_run=True, is_random=True)],
     [crop(size=sz)])

data = (SegmentationItemList.from_csv(path=PATH, csv_name=TRAIN_PATHS_CSV_NAME)
        .split_by_rand_pct(valid_pct=0.3, seed = seed)
        .label_from_func(get_y_fn, classes=array(['background', 'nucleus']))
        #.add_test_folder(TEST_ONE_FOLDER)
        .transform(tfms, tfm_y=True, size=sz)
        .databunch(bs=bs)
        .normalize())

arch = torchvision.models.resnet50
learner = unet_learner(data = data, 
                       arch = arch, 
                       metrics=[dice, dice_loss, iou],
                       wd=wd, 
                       bottle=True)

ptrampert · March 15, 2019, 3:18pm

I experienced the same problems. ResNet34 works fine with Unet, but ResNet50 even with a single image per batch produces memory overflows on a 1080Ti with 11 GB. I tried it with the Camvid Notebook.

On a GV100 with 32 GB I can run ResNet50 on 512x512 images with a batch size of 4 at maximum, which seems a little too low. Unfortunately, I could not figure out where this comes from. A saved ResNet50 Unet tages 1.36 GB of memory on my hard disk, so it seems a little strange.

I am working on Ubuntu 18.04.

Any further experiences?