Struggles with completing lesson 1 on Google Colab


(Stas Bekman) #1

I got inspired by Manikanta’s "Fast.ai Lesson 1 on Google Colab (Free GPU)"
https://towardsdatascience.com/fast-ai-lesson-1-on-google-colab-free-gpu-d2af89f53604

and for a few days now have been trying to get the first lesson’s notebook run there, unsuccessfully so far.

Either things fail due to lack of memory, or some other errors crop up. Even with sz=60 and bs=16 I still am unable to complete the run.

I tried a few forks of the code base and notebooks people posted here in the forums to no avail. It’s identical hardware/software for anybody running on google colab GPU so I’m unsure why some people reported success in several threads here.

Do any of you have a lesson 1 v2 notebook that you could share that you know that completes on google colab?

Thank you!


(Jacob Pettit) #2

Hey,

Did you see the thread Colaboratory and Fastai ?

I posted on there extensively about solving issues with memory, essentially for some odd reason some people are having an issue on Google Colab with when the code compiles operations for the GPU, there is an almost ‘double compiling’ thing happening. But check it out on the thread above, I shared a notebook which I forked and edited, it seems like it’s been working for some people while others have a new issue which has yet to be sorted out. But still, take a look, maybe it’ll fix your problem! :slight_smile:


(Stas Bekman) #3

Yes, Jacob, I read that thread and I tried to run the notebook you kindly shared and it fails early on in the very first training call. I thought perhaps there were some changes in the main version of fastai, so I forked the current version on github, applied the changes you suggested to remove do_gpu and it fails just the same.
That’s why I’m asking for help :frowning:

running your branch:

!pip uninstall -y fastai
!pip install git+https://github.com/jfpettit/fastai

this section fails:

arch=resnet34
#data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), bs=32)
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), bs=16)
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /content/.torch/models/resnet34-333f7ec4.pth

100%|██████████| 87306240/87306240 [00:03<00:00, 25361894.91it/s]

0%| | 0/1438 [00:00<?, ?it/s]


RuntimeError Traceback (most recent call last)
in ()
2 #data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), bs=32)
3 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), bs=16)
----> 4 learn = ConvLearner.pretrained(arch, data, precompute=True)
5 learn.fit(0.01, 3)

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, **kwargs)
97 def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, **kwargs):
98 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg, ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut)
—> 99 return cls(data, models, **kwargs)
100
101 @property

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in init(self, data, models, precompute, **kwargs)
90 elif self.metrics is None:
91 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
—> 92 if precompute: self.save_fc1()
93 self.freeze()
94 self.precompute = precompute

/usr/local/lib/python3.6/dist-packages/fastai/conv_learner.py in save_fc1(self)
142 m=self.models.top_model
143 if len(self.activations[0])!=len(self.data.trn_ds):
–> 144 predict_to_bcolz(m, self.data.fix_dl, act)
145 if len(self.activations[1])!=len(self.data.val_ds):
146 predict_to_bcolz(m, self.data.val_dl, val_act)

/usr/local/lib/python3.6/dist-packages/fastai/model.py in predict_to_bcolz(m, gen, arr, workers)
12 m.eval()
13 for x,*_ in tqdm(gen):
—> 14 y = to_np(m(VV(x)).data)
15 with lock:
16 arr.append(y)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
–> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py in forward(self, input)
65 def forward(self, input):
66 for module in self._modules.values():
—> 67 input = module(input)
68 return input
69

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
–> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
275 def forward(self, input):
276 return F.conv2d(input, self.weight, self.bias, self.stride,
–> 277 self.padding, self.dilation, self.groups)
278
279

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in conv2d(input, weight, bias, stride, padding, dilation, groups)
88 _pair(0), groups, torch.backends.cudnn.benchmark,
89 torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
—> 90 return f(input, weight, bias)
91
92

RuntimeError: Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same


(Aditya) #4

The error is very clear…

You have actually made your tensors on GPU and CPU…

We can’t have them on both…(either use cuda or you for both else cpu)…


(Stas Bekman) #5

OK, but I haven’t done any changes to the code in lesson 1 notebook (only reduced sz and bs). This problem doesn’t happen when I run fast.ai’s code base. It does when I run jfpettit’s fork with 2 to_gpu() calls removed. Perhaps those removed to_gpu() calls are actually the culprit. But according to jfpettit he had to do it to make it run on colab.

I used the lesson 1 notebook from https://towardsdatascience.com/fast-ai-lesson-1-on-google-colab-free-gpu-d2af89f53604 which had tweaks in the setup to make it run on colab, but otherwise I believe it’s identical to the unaltered lesson 1 notebook.

All I am trying to do is to complete the run of unmodified notebook from lesson 1.


(Stas Bekman) #6

I have been working on coding a memory footprint debug function so that it’ll be easier to get reasonable sample/batch sizes to match the given hardware limitations.

General RAM was easy, but GPU RAM seems to be an issue.

On Google Colab I found /opt/bin/nvidia-smi (needed by GPUtil), but it seems to be always starting with 95% GPU memory used. Any idea why? Here is a quickly-made memory debug function:

# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize

import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn't guaranteed
gpu = GPUs[0]

def printm():
  process = psutil.Process(os.getpid())
  print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " I Proc size: " + humanize.naturalsize( process.memory_info().rss))
  print('GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB'.format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))

printm()

So when run before anything else in the notebook I get:

Gen RAM Free: 11.6 GB  | Proc size: 666.0 MB
GPU RAM Free: 566MB | Used: 10873MB | Util  95% | Total 11439MB

I think restarting the runtime gets you pretty much the same instance, only once in many re-connects I saw GPU utilization of 0% at the beginning of the run. Most of the time I get 95% used before anything happened.

Either this util doesn’t report the right numbers, or I got tied to a GPU instance that doesn’t clear its memory, or Google found a way to share one GPU between multiple users, giving only 5% to each user - so only 566MB GPU RAM is practically available for the runs?

Just before the notebook crushes in running:

learn.fit(lr, 3, cycle_len=1, cycle_mult=2)

I get:

lr=np.array([1e-4,1e-3,1e-2])
printm()

Gen RAM Free: 7.3 GB  | Proc size: 5.2 GB
GPU RAM Free: 168MB | Used: 11271MB | Util:  99% | Total: 11439MB

So clearly there is almost no GPU memory left and so it crushes with cuda out-of-memory error.

At the very least now I can see numerically why the notebook can’t complete the run.

Also I discovered that if that output is correct their GPU is only 11GB and not 22GB as it is supposed to be according to the spec.

Do you also see 95% GPU memory utilization when you run this little memory dump function on Google Colab? In which case we get 13GB RAM and 566MB GPU RAM for each instance.