Error while fitting the model

abhigupta4981 · December 1, 2018, 1:32pm

So I was trying to fit my model using learn.fit_one_cycle(5, slice(lr)) and when model got to evaluation mode, the model was interrupted and got the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-be1ab4476b35> in <module>()
----> 1 learn.fit_one_cycle(5, slice(lr))

~/.anaconda3/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     18     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     19                                         pct_start=pct_start, **kwargs))
---> 20     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     21 
     22 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    160         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    161         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162             callbacks=self.callbacks+callbacks)
    163 
    164     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
     95     finally: cb_handler.on_train_end(exception)
     96 

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     87             if hasattr(data,'valid_dl') and data.valid_dl is not None and data.valid_ds is not None:
     88                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 89                                        cb_handler=cb_handler, pbar=pbar)
     90             else: val_loss=None
     91             if cb_handler.on_epoch_end(val_loss): break

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     47     with torch.no_grad():
     48         val_losses,nums = [],[]
---> 49         for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
     50             if cb_handler: xb, yb = cb_handler.on_batch_begin(xb, yb, train=False)
     51             val_losses.append(loss_batch(model, xb, yb, loss_func, cb_handler=cb_handler))

~/.anaconda3/lib/python3.7/site-packages/fastprogress/fastprogress.py in __iter__(self)
     63         self.update(0)
     64         try:
---> 65             for i,o in enumerate(self._gen):
     66                 yield o
     67                 if self.auto_update: self.update(i+1)

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_data.py in __iter__(self)
     67         "Process and returns items from `DataLoader`."
     68         assert not self.skip_size1 or self.batch_size > 1, "Batch size cannot be one if skip_size1 is set to True"
---> 69         for b in self.dl:
     70             y = b[1][0] if is_listy(b[1]) else b[1]
     71             if not self.skip_size1 or y.size(0) != 1: yield self.proc_batch(b)

~/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    635                 self.reorder_dict[idx] = batch
    636                 continue
--> 637             return self._process_next_batch(batch)
    638 
    639     next = __next__  # Python 2 compatibility

~/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    656         self._put_indices()
    657         if isinstance(batch, ExceptionWrapper):
--> 658             raise batch.exc_type(batch.exc_msg)
    659         return batch
    660 

TypeError: Traceback (most recent call last):
  File "/home/nbuser/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/nbuser/.anaconda3/lib/python3.7/site-packages/fastai/torch_core.py", line 97, in data_collate
    return torch.utils.data.dataloader.default_collate(to_data(batch))
  File "/home/nbuser/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/nbuser/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 232, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/nbuser/.anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 223, in default_collate
    return torch.LongTensor(batch)
TypeError: an integer is required (got type NoneType)

can someone find what is wrong in this?

here’s the code for learn:

arch = models.resnet50
learn = create_cnn(data, arch)

and here’s the code for data:

np.random.seed(42)
src = (ImageItemList.from_csv(PATH, 'train.csv', folder='train', suffix=None)
       .random_split_by_pct(0.2)
       .label_from_df())
data = (src.transform(tfms, size=128)
        .databunch()
        .normalize(imagenet_stats))

keyurparalkar · December 1, 2018, 1:53pm

Have you checked the dataframe for any unlabelled examples ?

abhigupta4981 · December 1, 2018, 2:03pm

no, how to do that?

prajjwal1 · December 1, 2018, 2:15pm

df = pd.read_csv('csv_file')
df.is_null() #Checks for all null values
df.isnull().sum() #In case, you want to calculate sum
df[col].isnull() #To check in a specific column

To remove these values

df1 = df.dropna()
df1.isnull().sum() # To verify
df1.to_csv('csv_file')

There you go !

abhigupta4981 · December 1, 2018, 7:31pm

thanx man

abhigupta4981 · December 1, 2018, 7:40pm

so i tried to check if the dataframe has any unlabelled examples but it didn’t have any. Its still showing the same error.

Krisztian · December 2, 2018, 9:08am

Are you doing the whale competition?

Anyway, I had the same error. The reason: my validation set had classes not seen in the training set.

abhigupta4981 · December 2, 2018, 9:27am

Yeah how did you remove the error

Krisztian · December 2, 2018, 9:59am

Just make sure the classes in the validation set exist in the training set.

As a quick diagnostic, try rerunning the model fitting without a validation set (or only with 1 element).

If that works out, then think about your validation strategy. What to do when there are many classes, and only few observations/class?