Am I the only one finding hard to use the lr_find method because it keeps on crashing?
Some context:
fastai version: 2.5.2 and 2.5.3
I am using a unet_learner created this way: unet_learner(dls, resnet18, pretrained=False, n_out=1)
Running on Colab
Here is the notebook:
There is some data in my Drive that I am using but you can get it here and edit the lines where I extract it. (167.7 MiB): https://drive.google.com/file/d/10jfRKSFUboTyb_zDF4TGr2PgybMRRzgQ/view?usp=sharing
EDIT: Even learn.fine_tune(3)
is failing so there’s definitely something I am doing wrong.
Error Message:
RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 11.17 GiB total capacity; 7.97 GiB already allocated; 665.81 MiB free; 10.07 GiB reserved in total by PyTorch)
Some other important context:
Output of !nvidia-smi
before I do anything
Wed Nov 24 12:17:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 74C P0 86W / 149W | 3MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
How I am creating the dls:
get_mask = lambda o: path/'train'/f'{o.stem}_seg{o.suffix}'
# this because both files and masks are in one folder
train_files = L()
for x in files:
res = re.findall(r'.+HC.png$', x.name)
if res:
train_files.append(path/'train'/f'{res[0]}')
def custom_get_items(folder):
return L(folder)
size = 256
hc = DataBlock(blocks=(ImageBlock, MaskBlock()),
get_items=custom_get_items,
splitter=RandomSplitter(seed=42),
get_y=get_mask,
item_tfms=[Resize(size,pad_mode=PadMode.Border)])
dls = hc.dataloaders(train_files)
learn = unet_learner(dls, resnet18, pretrained=False, n_out=1)
learn.fine_tune(3)
After that block of code is when I am getting the error.