Out of Memory Crash Triggered by lr_find and fine tune for unet_learner

Am I the only one finding hard to use the lr_find method because it keeps on crashing?

Some context:
fastai version: 2.5.2 and 2.5.3
I am using a unet_learner created this way: unet_learner(dls, resnet18, pretrained=False, n_out=1)
Running on Colab

Here is the notebook:

There is some data in my Drive that I am using but you can get it here and edit the lines where I extract it. (167.7 MiB): https://drive.google.com/file/d/10jfRKSFUboTyb_zDF4TGr2PgybMRRzgQ/view?usp=sharing

EDIT: Even learn.fine_tune(3) is failing so there’s definitely something I am doing wrong.

Error Message:

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 11.17 GiB total capacity; 7.97 GiB already allocated; 665.81 MiB free; 10.07 GiB reserved in total by PyTorch)

Some other important context:

Output of !nvidia-smi before I do anything

Wed Nov 24 12:17:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   74C    P0    86W / 149W |      3MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

How I am creating the dls:

get_mask = lambda o: path/'train'/f'{o.stem}_seg{o.suffix}'

# this because both files and masks are in one folder
train_files = L()
for x in files:
  res = re.findall(r'.+HC.png$', x.name)
  if res:
    train_files.append(path/'train'/f'{res[0]}')

def custom_get_items(folder):
  return L(folder)

size = 256

hc = DataBlock(blocks=(ImageBlock, MaskBlock()),
                   get_items=custom_get_items,
                   splitter=RandomSplitter(seed=42),
                   get_y=get_mask,
                   item_tfms=[Resize(size,pad_mode=PadMode.Border)])

dls = hc.dataloaders(train_files)
learn = unet_learner(dls, resnet18, pretrained=False, n_out=1)
learn.fine_tune(3)

After that block of code is when I am getting the error.

It’s probably because you haven’t specified a batch size in your dataloaders.

The default batch size is 64 which might be too much for your GPU!

Try:
dls = hc.dataloaders(train_files, bs=8)
Or what ever batch size works for you(re GPU)

If you want a large batch size with less memory maybe set your learner to mixed precision with .to_fp16()

Some links:

Rookie Mistake. Thank you