Error when using MSELossFlat() after upgrading library

jamesp · November 5, 2018, 8:17pm

I was previously using the MSELossFlat() function to create a learner for a scalar (basically, to do regression using an image). I updated my FastAI v1 library today and had to replace ImageDataset with ImageClassificationDataset, but now I am getting a dimensional error when I try to find my learning rate. Is there a different Image*Dataset I should be using with the latest version of the library?

LR Finder complete, type {learner_name}.recorder.plot() to see the graph.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-e10d9f8adb8f> in <module>()
----> 1 learn2.lr_find(start_lr=1e-5, end_lr=100)
      2 learn2.recorder.plot()

/app/fastai/fastai/train.py in lr_find(learn, start_lr, end_lr, num_it, stop_div, **kwargs)
     28     cb = LRFinder(learn, start_lr, end_lr, num_it, stop_div)
     29     a = int(np.ceil(num_it/len(learn.data.train_dl)))
---> 30     learn.fit(a, start_lr, callbacks=[cb], **kwargs)
     31 
     32 def to_fp16(learn:Learner, loss_scale:float=512., flat_master:bool=False)->Learner:

/app/fastai/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    160         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    161         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162             callbacks=self.callbacks+callbacks)
    163 
    164     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/app/fastai/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

/app/fastai/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     85                 if cb_handler.on_batch_end(loss): break
     86 

/app/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     20 
     21     if not loss_func: return to_detach(out), yb[0].detach()
---> 22     loss = loss_func(out, *yb)
     23 
     24     if opt is not None:

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

/app/fastai/fastai/layers.py in forward(self, input, target)
    101     "Same as `nn.MSELoss`, but flattens input and target."
    102     def forward(self, input:Tensor, target:Tensor) -> Rank0Tensor:
--> 103         return super().forward(input.view(-1), target.view(-1))
    104 
    105 def simple_cnn(actns:Collection[int], kernel_szs:Collection[int]=None,

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    422 
    423     def forward(self, input, target):
--> 424         return F.mse_loss(input, target, reduction=self.reduction)
    425 
    426 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in mse_loss(input, target, size_average, reduce, reduction)
   1830     if size_average is not None or reduce is not None:
   1831         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 1832     return _pointwise_loss(lambda a, b: (a - b) ** 2, torch._C._nn.mse_loss, input, target, reduction)
   1833 
   1834 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pointwise_loss(lambd, lambd_optimized, input, target, reduction)
   1786         return torch.mean(d) if reduction == 'mean' else torch.sum(d)
   1787     else:
-> 1788         expanded_input, expanded_target = torch.broadcast_tensors(input, target)
   1789         return lambd_optimized(expanded_input, expanded_target, _Reduction.get_enum(reduction))
   1790 

/usr/local/lib/python3.6/dist-packages/torch/functional.py in broadcast_tensors(*tensors)
     48                 [0, 1, 2]])
     49     """
---> 50     return torch._C._VariableFunctions.broadcast_tensors(tensors)
     51 
     52 

RuntimeError: The size of tensor a (58880) must match the size of tensor b (128) at non-singleton dimension 0

The learner:

from fastai import *
from fastai.vision import *
import torchvision.models as tvmodels

class ImageScalarDataset(ImageClassificationDataset):
    def __init__(self, df:DataFrame, path_column:str='file_path', dependent_variable:str=None):
        
        # The superclass does nice things for us like tensorizing the numpy
        # input
        super().__init__(df[path_column], np.array(df[dependent_variable], dtype=np.float32))

        # Old FastAI uses loss_fn, new FastAI uses loss_func
        self.loss_func = layers.MSELossFlat()
        self.loss_fn = self.loss_func

        # We have only one "class" (i.e., the single output scalar)
        self.classes = [0]

    def __len__(self)->int:
        return len(self.y)
    
    def __getitem__(self, i):
        # return x, y | where x is an image, and y is the scalar
        return open_image(self.x[i]), self.y[i]

data64 = ImageDataBunch.create(dat_train, dat_valid, dat_test,
    ds_tfms=get_transforms(),#do_flip=False),
    bs=128,
    size=64)

learn2 = create_cnn(data64, 
                     tvmodels.densenet121,
                     pretrained=True,
                     metrics=[exp_rmspe], 
                     ps=0.5,
                     callback_fns=ShowGraph)

jamesp · November 5, 2018, 8:54pm

One issue seems to be visible when I print the model:

  (1): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    )
    (1): Lambda()
    (2): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=2048, out_features=512, bias=True)
    (5): ReLU(inplace)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.5)
    (8): Linear(in_features=512, out_features=460, bias=True)
  )

It is producing 460 output features instead of the 1 output feature that I would expect. And 128 * 460 = 58880, which explains the value in the original error message. Does MSELossFlat() need to be modified, or is this just an issue on my end because I haven’t followed the v1 library development close enough over the past couple weeks?

jamesp · November 5, 2018, 9:10pm

Another piece to the puzzle is that 460 is the number of distinct scalar values in the training part of my dataset. Nevertheless, the Dataset correctly shows that I only have one “class”:

len(learn2.data.classes)
1

So, I think I’m overriding the self.classes value correctly. Not sure why MSELossFlat() isn’t getting me from 512 to 1 instead of 512 -> 460. I could pop off the last layer and directly do 512 -> 1; it just seems that this is not the expected behavior, so I’m trying to figure out if this is an error on my end, or if there is an unexpected change in the library.

sgugger · November 6, 2018, 12:29am

I’ll look into this tomorrow but it’s possible the refactor of the API broke down regression. Since it seems you are using a custom class in any case, I’d advice to have it a child of DatasetBase to avoid unnecessary conversions of your labels to classes.

mkolodny · November 8, 2018, 12:27am

@sgugger I’m seeing this error using ImageDataBunch, too, unfortunately:

$ data = ImageDataBunch.from_folder(path, test='test', valid_pct=0.2, ds_tfms=get_transforms(), size=size, bs=bs)
$ data.normalize(imagenet_stats)
$ learn = create_cnn(data, models.resnet34, metrics=fbeta)
$ learn.fit_one_cycle(4)

RuntimeError                              Traceback (most recent call last)
<ipython-input-17-495233eaf2b4> in <module>
----> 1 learn.fit_one_cycle(4)

/opt/anaconda3/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

/opt/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    160         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    161         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162             callbacks=self.callbacks+callbacks)
    163 
    164     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/opt/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

/opt/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     87             if hasattr(data,'valid_dl') and data.valid_dl is not None:
     88                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 89                                        cb_handler=cb_handler, pbar=pbar)
     90             else: val_loss=None
     91             if cb_handler.on_epoch_end(val_loss): break

/opt/anaconda3/lib/python3.6/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     52             if not is_listy(yb): yb = [yb]
     53             nums.append(yb[0].shape[0])
---> 54             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     55             if n_batch and (len(nums)>=n_batch): break
     56         nums = np.array(nums, dtype=np.float32)

/opt/anaconda3/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, loss)
    236         "Handle end of processing one batch with `loss`."
    237         self.state_dict['last_loss'] = loss
--> 238         stop = np.any(self('batch_end', not self.state_dict['train']))
    239         if self.state_dict['train']:
    240             self.state_dict['iteration'] += 1

/opt/anaconda3/lib/python3.6/site-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    184     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    185         "Call through to all of the `CallbakHandler` functions."
--> 186         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    187         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    188 

/opt/anaconda3/lib/python3.6/site-packages/fastai/callback.py in <listcomp>(.0)
    184     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    185         "Call through to all of the `CallbakHandler` functions."
--> 186         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
    187         return [getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs) for cb in self.callbacks]
    188 

/opt/anaconda3/lib/python3.6/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, train, **kwargs)
    269         if not is_listy(last_target): last_target=[last_target]
    270         self.count += last_target[0].size(0)
--> 271         self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
    272 
    273     def on_epoch_end(self, **kwargs):

/opt/anaconda3/lib/python3.6/site-packages/fastai/metrics.py in fbeta(y_pred, y_true, thresh, beta, eps, sigmoid)
     11     y_pred = (y_pred>thresh).float()
     12     y_true = y_true.float()
---> 13     TP = (y_pred*y_true).sum(dim=1)
     14     prec = TP/(y_pred.sum(dim=1)+eps)
     15     rec = TP/(y_true.sum(dim=1)+eps)

RuntimeError: The size of tensor a (2) must match the size of tensor b (128) at non-singleton dimension 1

I’m running fastai v1.0.20-py_1 on Linux fastai-instance 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) x86_64 (Google Cloud)

sgugger · November 8, 2018, 12:45am

What dataset are you training on? The fastai implementation of beta is intended for multiclassification so that may be the reason.

alvisanovari · November 17, 2018, 2:33am

@sgugger I am having the same issue. There does not seem to be a straightforward way to use the Image classification approach for scalar datasets / labels. In my case i have images with labels as numbers and I want to do regression on them rather than classify:

Is there something we need to do (pass in an argument for a different dataset?) that will fix this?

alvisanovari · November 17, 2018, 6:58am

OK! I think I figured it out!

data = (ImageItemList.from_csv(path=Path('aws-bin'), csv_name='labels.csv', folder='images', suffix='.jpg')
        #Where to find the data? -> in planet 'train' folder
        .random_split_by_pct()
        #How to split in train/valid? -> randomly with the default 20% in valid
        .label_from_df(cols='label', label_cls=FloatList)
        #How to label? -> use the csv file
        .transform(get_transforms(), size=224)
        #Data augmentation? -> use tfms with a size of 128
        .databunch())                          
        #Finally -> use the defaults for conversion to databunch
data.normalize(imagenet_stats)

I added the FloatList dataset here: label_cls=FloatList. And I was able to get the model to train. Now I just need to understand how to interpret the result LOL.

I think the explicit setting of learn.loss function to MSELossFlat is redundant here since it is already being done in the FloatList class but I kept it for now.

I added exp_rmse as the metric but I remember Jeremy interpreted rmse directly form valid loss in the lecture? Any tips on how to interpret these results?

zhangweida2080 · January 29, 2019, 7:37am

may i see your shared notebook ?

alvisanovari · January 29, 2019, 8:06am