Error when I try to train the model

alejo96 · February 18, 2018, 8:53pm

Hello!

I was trying to execute the code of the Jupyter Notebook for Leson 1 but when I try to train the model, I have the following error:

RuntimeError Traceback (most recent call last)
in ()
1 arch=resnet34
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
----> 3 learn = ConvLearner.pretrained(arch, data, precompute=True)
4 learn.fit(0.01, 2)

~\Desktop\ML Course\fastai\fastai\conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, **kwargs)
96 def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, **kwargs):
97 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg, ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut)
—> 98 return cls(data, models, **kwargs)
99
100 @property

~\Desktop\ML Course\fastai\fastai\conv_learner.py in init(self, data, models, precompute, **kwargs)
89 elif self.metrics is None:
90 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
—> 91 if precompute: self.save_fc1()
92 self.freeze()
93 self.precompute = precompute

~\Desktop\ML Course\fastai\fastai\conv_learner.py in save_fc1(self)
141 m=self.models.top_model
142 if len(self.activations[0])!=len(self.data.trn_ds):
–> 143 predict_to_bcolz(m, self.data.fix_dl, act)
144 if len(self.activations[1])!=len(self.data.val_ds):
145 predict_to_bcolz(m, self.data.val_dl, val_act)

~\Desktop\ML Course\fastai\fastai\model.py in predict_to_bcolz(m, gen, arr, workers)
12 m.eval()
13 for x,*_ in tqdm(gen):
—> 14 y = to_np(m(VV(x)).data)
15 with lock:
16 arr.append(y)

C:\Conda\envs\fastai\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
–> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

C:\Conda\envs\fastai\lib\site-packages\torch\nn\modules\container.py in forward(self, input)
65 def forward(self, input):
66 for module in self._modules.values():
—> 67 input = module(input)
68 return input
69

C:\Conda\envs\fastai\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
323 for hook in self._forward_pre_hooks.values():
324 hook(self, input)
–> 325 result = self.forward(*input, **kwargs)
326 for hook in self._forward_hooks.values():
327 hook_result = hook(self, input, result)

C:\Conda\envs\fastai\lib\site-packages\torch\nn\modules\batchnorm.py in forward(self, input)
35 return F.batch_norm(
36 input, self.running_mean, self.running_var, self.weight, self.bias,
—> 37 self.training, self.momentum, self.eps)
38
39 def repr(self):

C:\Conda\envs\fastai\lib\site-packages\torch\nn\functional.py in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
1011 raise ValueError(‘Expected more than 1 value per channel when training, got input size {}’.format(size))
1012 f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled)
-> 1013 return f(input, weight, bias)
1014
1015

RuntimeError: cuda runtime error (2) : out of memory at d:\pytorch\pytorch\torch\lib\thc\generic/THCStorage.cu:58

I am in a windows enviroment and I think I was able to setup it correctly but I don’t know what is happening. Did I really run out of memory ? I have a surface book with the Nvidia GeForce GTX 965M and 2 GB

Thank you

meanderingstrm · February 18, 2018, 11:08pm

With a 2GB graphics card you are going to need a smaller batch size than the default. With my 4GB card, I used bs=32. Dig into the API for ImageClassifierData.from_paths

alejo96 · February 19, 2018, 10:28am

thank you. that solve my problem!