Oh, do you have to call it? (RMSE()
instead of RMSE
)?
No, oddly enough, there are separate functions for RMSE
and root_mean_squared_error
. The latter can be used as the convnet loss function:
learn = create_cnn(data, arch=SimpleCNN, pretrained=False, loss_func=root_mean_squared_error)
But doing
learn = create_cnn(data, arch=SimpleCNN, pretrained=False, loss_func=RMSE)
will fail.
I’m pretty puzzled by this behavior, but perhaps somebody more experienced with the codebase can provide a good explanation of the two functions’ differences.
Be careful, RMSE is a class that is intended to compute RMSE as a metric (it’s going to accumulate all the predictions/targets before computing it since we can’t do it batch by batch then average), not a loss function. Using root_mean_squared_error
should work, as you noticed.
How can I help with the documentation to make that more clear? Or is it already clear and the two of us missed it? I’m looking at https://docs.fast.ai/metrics.html and it seems that root_mean_squared_error
and RMSE
are both in the metrics
section.
Any metric that is a class is in fact a Callback
(as explained a little bit further ahead in the docs). A nice introduction in this section would probably be useful to explain the difference between the two.
I’m not sure if it’s due to the changes in the code after these months but I’m getting errors running the following:
data = ImageDataBunch.from_df("./", df=df2, valid_pct = 0.2, size=224,\
ds_tfms=get_transforms(), num_workers=0,)\
.normalize(imagenet_stats).label_from_df(cols=1, label_cls=FloatList)
class MSELossFlat(nn.MSELoss):
#“Same as `nn.MSELoss`, but flattens input and target.”
def forward(self, input:Tensor, target:Tensor) -> Rank0Tensor:
return super().forward(input.view(-1), target.view(-1))
learn = create_cnn(data, models.resnet34)
The Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-82-f600890a6412> in <module>
1
----> 2 learn = create_cnn(data, models.resnet34)
3 learn.loss = MSELossFlat
d:\Anaconda3\lib\site-packages\fastai\vision\learner.py in create_cnn(data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, **learn_kwargs)
75 head = custom_head or create_head(nf, data.c, lin_ftrs, ps=ps, bn_final=bn_final)
76 model = nn.Sequential(body, head)
---> 77 learn = Learner(data, model, **learn_kwargs)
78 learn.split(ifnone(split_on,meta['split']))
79 if pretrained: learn.freeze()
<string> in __init__(self, data, model, opt_func, loss_func, metrics, true_wd, bn_wd, wd, train_bn, path, model_dir, callback_fns, callbacks, layer_groups)
d:\Anaconda3\lib\site-packages\fastai\basic_train.py in __post_init__(self)
151 self.path = Path(ifnone(self.path, self.data.path))
152 (self.path/self.model_dir).mkdir(parents=True, exist_ok=True)
--> 153 self.model = self.model.to(self.data.device)
154 self.loss_func = ifnone(self.loss_func, self.data.loss_func)
155 self.metrics=listify(self.metrics)
d:\Anaconda3\lib\site-packages\fastai\data_block.py in __getattr__(self, k)
586 res = getattr(y, k, None)
587 if res is not None: return res
--> 588 raise AttributeError(k)
589
590 def __setstate__(self,data:Any): self.__dict__.update(data)
AttributeError: device
I’ve tried adding the following line before create_cnn():
data.device = torch.device('cpu')
And ran the following:
data.device = torch.device('cpu')
learn = create_cnn(data, models.resnet34)
learn.loss = MSELossFlat
learn.fit_one_cycle(4)
But now it gives me a new error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-89-495233eaf2b4> in <module>
----> 1 learn.fit_one_cycle(4)
d:\Anaconda3\lib\site-packages\fastai\train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, tot_epochs, start_epoch)
20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start, tot_epochs=tot_epochs,
21 start_epoch=start_epoch))
---> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
23
24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):
d:\Anaconda3\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
174 if not getattr(self, 'opt', False): self.create_opt(lr, wd)
175 else: self.opt.lr,self.opt.wd = lr,wd
--> 176 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
177 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
178 callbacks=self.callbacks+callbacks)
d:\Anaconda3\lib\site-packages\fastai\basic_train.py in <listcomp>(.0)
174 if not getattr(self, 'opt', False): self.create_opt(lr, wd)
175 else: self.opt.lr,self.opt.wd = lr,wd
--> 176 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
177 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
178 callbacks=self.callbacks+callbacks)
d:\Anaconda3\lib\site-packages\fastai\basic_train.py in __init__(self, learn)
382 super().__init__(learn)
383 self.opt = self.learn.opt
--> 384 self.train_dl = self.learn.data.train_dl
385 self.no_val,self.silent = False,False
386
d:\Anaconda3\lib\site-packages\fastai\data_block.py in __getattr__(self, k)
586 res = getattr(y, k, None)
587 if res is not None: return res
--> 588 raise AttributeError(k)
589
590 def __setstate__(self,data:Any): self.__dict__.update(data)
AttributeError: train_dl
Any ideas? If necessary, I’ll provide a dataset for you to play with to hopefully reproduce this (but it’s getting late for me.)
Also, here are my specs:
Windows 10
Python 3.6
Pytorch 1.0.0
Fastai ver 1.0.45
You’re mixing a bit of data block API (final label_from_df
) with a factory method. That can’t work, you have to pick which API you want to use
Ok, am I still mixing them this time around?
data = (ImageDataBunch.from_df("./", df=df2, num_workers=0,)\
.label_from_df(cols=1, label_cls=FloatList)
.transform(tfms = get_transforms(), size=224))
data.normalize(imagenet_stats)
But I’m still getting an error:
d:\Anaconda3\lib\site-packages\fastai\basic_data.py:260: UserWarning: It's not possible to collate samples of your dataset together in a batch.
Shapes of the inputs/targets:
[[torch.Size([3, 58, 58]), torch.Size([3, 25, 25]), torch.Size([3, 92, 92]), torch.Size([3, 99, 99]), torch.Size([3, 102, 102]), torch.Size([3, 92, 92]), torch.Size([3, 103, 103]), torch.Size([3, 52, 52]), torch.Size([3, 142, 142]), torch.Size([3, 133, 133]), torch.Size([3, 91, 91]), torch.Size([3, 70, 70]), torch.Size([3, 73, 73]), torch.Size([3, 99, 99]), torch.Size([3, 102, 102]), torch.Size([3, 108, 108]), torch.Size([3, 44, 44]), torch.Size([3, 22, 22]), torch.Size([3, 76, 76]), torch.Size([3, 104, 104]), torch.Size([3, 84, 84]), torch.Size([3, 74, 74]), torch.Size([3, 141, 141]), torch.Size([3, 56, 56]), torch.Size([3, 105, 105]), torch.Size([3, 81, 81]), torch.Size([3, 60, 60]), torch.Size([3, 93, 93]), torch.Size([3, 62, 62]), torch.Size([3, 61, 61]), torch.Size([3, 97, 97]), torch.Size([3, 90, 90]), torch.Size([3, 79, 79]), torch.Size([3, 94, 94]), torch.Size([3, 81, 81]), torch.Size([3, 24, 24]), torch.Size([3, 104, 104]), torch.Size([3, 121, 121]), torch.Size([3, 39, 39]), torch.Size([3, 94, 94]), torch.Size([3, 117, 117]), torch.Size([3, 43, 43]), torch.Size([3, 25, 25]), torch.Size([3, 100, 100]), torch.Size([3, 91, 91]), torch.Size([3, 124, 124]), torch.Size([3, 88, 88]), torch.Size([3, 72, 72]), torch.Size([3, 27, 27]), torch.Size([3, 98, 98]), torch.Size([3, 80, 80]), torch.Size([3, 41, 41]), torch.Size([3, 61, 61]), torch.Size([3, 91, 91]), torch.Size([3, 83, 83]), torch.Size([3, 58, 58]), torch.Size([3, 91, 91]), torch.Size([3, 80, 80]), torch.Size([3, 76, 76]), torch.Size([3, 97, 97]), torch.Size([3, 102, 102]), torch.Size([3, 67, 67]), torch.Size([3, 92, 92]), torch.Size([3, 52, 52])], [(), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()]]
warn(message)
You can deactivate this warning by passing `no_check=True`.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
d:\Anaconda3\lib\site-packages\fastai\data_block.py in _check_kwargs(ds, tfms, **kwargs)
537 x = ds[0]
--> 538 try: x.apply_tfms(tfms, **kwargs)
539 except Exception as e:
d:\Anaconda3\lib\site-packages\fastai\vision\image.py in apply_tfms(self, tfms, do_resolve, xtra, size, resize_method, mult, padding_mode, mode, remove_out)
100 xtra = ifnone(xtra, {})
--> 101 tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)
102 default_rsz = ResizeMethod.SQUISH if (size is not None and is_listy(size)) else ResizeMethod.CROP
d:\Anaconda3\lib\site-packages\fastai\vision\image.py in <lambda>(o)
100 xtra = ifnone(xtra, {})
--> 101 tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)
102 default_rsz = ResizeMethod.SQUISH if (size is not None and is_listy(size)) else ResizeMethod.CROP
AttributeError: 'list' object has no attribute 'tfm'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
<ipython-input-11-d0c054d109fc> in <module>
1 data = (ImageDataBunch.from_df("./", df=df2, num_workers=0,)\
2 .label_from_df(cols=1, label_cls=FloatList)
----> 3 .transform(tfms = get_transforms(), size=224))
4
5 data.normalize(imagenet_stats)
d:\Anaconda3\lib\site-packages\fastai\data_block.py in transform(self, tfms, tfm_y, **kwargs)
661 def transform(self, tfms:TfmList, tfm_y:bool=None, **kwargs):
662 "Set the `tfms` and `tfm_y` value to be applied to the inputs and targets."
--> 663 _check_kwargs(self.x, tfms, **kwargs)
664 if tfm_y is None: tfm_y = self.tfm_y
665 if tfm_y: _check_kwargs(self.y, tfms, **kwargs)
d:\Anaconda3\lib\site-packages\fastai\data_block.py in _check_kwargs(ds, tfms, **kwargs)
538 try: x.apply_tfms(tfms, **kwargs)
539 except Exception as e:
--> 540 raise Exception(f"It's not possible to apply those transforms to your dataset:\n {e}")
541
542 class LabelList(Dataset):
Exception: It's not possible to apply those transforms to your dataset:
'list' object has no attribute 'tfm'
ImageDataBunch.from_df
is a factory method to construct a DataBunch
. You wanted to write ImageList.from_df
for the data block API I think.
use custom head it solves the problem,
try this
learn = cnn_learner(data, models.resnet18,loss_func=nn.L1Loss, metrics=error_rate ,custom_head=nn.Sequential(Flatten(), nn.Linear(1,1)) )
I’m trying to implement ‘BBox only’ part of lesson 8 from 2018 course using fastai2. I’m not sure if it’s relevant but posting it here just because I’m getting the exact same error:
My implementation:
def get_bbox_dls(df,sz=128,bs=128):
getters = [lambda o: path/'train'/o,\
lambda o: img2bbox[o],
lambda o: ['']]
dblock = DataBlock(
blocks=(ImageBlock,BBoxBlock, BBoxLblBlock),
get_items=get_train_imgs,
getters=getters,n_inp=1,
splitter=RandomSplitter(seed=47),
item_tfms=Resize(sz,method='squish'),
batch_tfms=[*aug_transforms(),Normalize.from_stats(*imagenet_stats)])
return dblock.dataloaders(df,bs=bs)
img2bbox
has the mapping of img_path to bbox of largest object. Using BBoxLblBox
to adapt underlying bb_pad
implementation but model isn’t supposed to predict any label yet. I’ve modified L1Loss accordingly to work with ys:
class CustomL1Loss(nn.L1Loss):
def forward(self, input, bbox_targets, lbl_targets):
return F.l1_loss(input, bbox_targets, reduction=self.reduction)
Some debugging done.
The bbox_targets coming out of dataloaders are of shape (bs,1,4)
_,x,_ = dls.one_batch(); x.shape
Working with random tensors cause no issues with CustomL1Loss
cust_l1 = CustomL1Loss()
inp = torch.randn(8,1,4)
op = torch.randn(8,1,4)
op2 = torch.randn(8,1)
cust_l1(inp,op,op2)
EDIT: Here is the whole stack trace of error
RuntimeError Traceback (most recent call last)
<ipython-input-52-bb1de44e7349> in <module>()
----> 1 learn.fit_one_cycle(1,lr_max=1e-4)
7 frames
/usr/local/lib/python3.6/dist-packages/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
188 try:
189 self.epoch=epoch; self('begin_epoch')
--> 190 self._do_epoch_train()
191 self._do_epoch_validate()
192 except CancelEpochException: self('after_cancel_epoch')
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in _do_epoch_train(self)
161 try:
162 self.dl = self.dls.train; self('begin_train')
--> 163 self.all_batches()
164 except CancelTrainException: self('after_cancel_train')
165 finally: self('after_train')
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in all_batches(self)
139 def all_batches(self):
140 self.n_iter = len(self.dl)
--> 141 for o in enumerate(self.dl): self.one_batch(*o)
142
143 def one_batch(self, i, b):
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in one_batch(self, i, b)
147 self.pred = self.model(*self.xb); self('after_pred')
148 if len(self.yb) == 0: return
--> 149 self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
150 if not self.training: return
151 self.loss.backward(); self('after_backward')
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in __init__(self, size_average, reduce, reduction)
83
84 def __init__(self, size_average=None, reduce=None, reduction='mean'):
---> 85 super(L1Loss, self).__init__(size_average, reduce, reduction)
86
87 def forward(self, input, target):
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in __init__(self, size_average, reduce, reduction)
10 super(_Loss, self).__init__()
11 if size_average is not None or reduce is not None:
---> 12 self.reduction = _Reduction.legacy_get_string(size_average, reduce)
13 else:
14 self.reduction = reduction
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py in legacy_get_string(size_average, reduce, emit_warning)
34 reduce = True
35
---> 36 if size_average and reduce:
37 ret = 'mean'
38 elif reduce:
RuntimeError: bool value of Tensor with more than one value is ambiguous
As we very often say, please do not post just the last part of the error message but the whole stack trace.
I’ve posted whole stack trace for your reference
It looks like your loss is not properly initialized: instead of getting initialized with the ()
at the beginning, it takes the tensors at your call of cust_l1
and send them in the init.
My bad I passed in loss to learner as
learn = cnn_learner(dls,resnet34,loss_func=CustomL1Loss)
instead of
learn = cnn_learner(dls,resnet34,loss_func=CustomL1Loss())
One more things I observed is, such kind of mistakes also causes Cuda:Out of Memory error, since, in my case, a variable supposed to hold a bool
value was trying to store a 3D Tensor with new instance for every iteration.
So I’d say “Cuda:Out of memory” could be a hint that you’re doing something similar.