.no_split() not working


I am trying to setup a pipeline with the datablock api where the validation set is empty because I want to train the model on the whole data.
Therefore I am using the .no_split() method, but unfortunately an error is raised when I call .databunch().
A workaround for now would be to make the validation set really small, but I want this to work with .no_split().

Thanks a lot in advance!

Here is my code and the stacktrace:

data = (ImageItemList.from_csv(path=PATH, csv_name=CSV_NAME, folder=TRAIN_NAME, suffix=’.tif’)
.add_test_folder(test_folder=TEST_NAME, label=None)
.databunch(bs=bs, num_workers=nw)

IndexError Traceback (most recent call last)
4 .add_test_folder(test_folder=TEST_NAME, label=None)
5 .transform(tfms)
----> 6 .databunch(bs=bs, num_workers=nw)
7 #.normalize(imagenet_stats))
8 .normalize())

~/work/network/fastai/fastai/vision/data.py in normalize(self, stats, do_x, do_y)
181 “Add normalize transform using stats (defaults to DataBunch.batch_stats)”
182 if getattr(self,‘norm’,False): raise Exception(‘Can not call normalize twice’)
–> 183 if stats is None: self.stats = self.batch_stats()
184 else: self.stats = stats
185 self.norm,self.denorm = normalize_funcs(*self.stats, do_x=do_x, do_y=do_y)

~/work/network/fastai/fastai/vision/data.py in batch_stats(self, funcs)
175 “Grab a batch of data and call reduction function func per channel”
176 funcs = ifnone(funcs, [torch.mean,torch.std])
–> 177 x = self.one_batch(ds_type=DatasetType.Valid, denorm=False)[0].cpu()
178 return [func(channel_view(x), 1) for func in funcs]

~/work/network/fastai/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm, cpu)
140 w = self.num_workers
141 self.num_workers = 0
–> 142 try: x,y = next(iter(dl))
143 finally: self.num_workers = w
144 if detach: x,y = to_detach(x,cpu=cpu),to_detach(y,cpu=cpu)

~/work/network/fastai/fastai/basic_data.py in iter(self)
69 def iter(self):
70 “Process and returns items from DataLoader.”
—> 71 for b in self.dl: yield self.proc_batch(b)
73 @classmethod

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in next(self)
635 self.reorder_dict[idx] = batch
636 continue
–> 637 return self._process_next_batch(batch)
639 next = next # Python 2 compatibility

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
656 self._put_indices()
657 if isinstance(batch, ExceptionWrapper):
–> 658 raise batch.exc_type(batch.exc_msg)
659 return batch

IndexError: Traceback (most recent call last):
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File “…/fastai/fastai/data_block.py”, line 587, in getitem
if self.item is None: x,y = self.x[idxs],self.y[idxs]
File “…/fastai/fastai/data_block.py”, line 102, in getitem
if isinstance(idxs, numbers.Integral): return self.get(idxs)
File “…/fastai/fastai/vision/data.py”, line 276, in get
fn = super().get(i)
File “…/fastai/fastai/data_block.py”, line 62, in get
return self.items[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

It doesn’t come from no_split but normalize. Since you’re not passing any stats, it tries to compute them on a batch of the validation set, which fails for obvious reasons :wink:

Ah thanks a lot, that makes sense!
Is there a way to tell fastai, that it should compute the stats from the train set?
Or would this be nonsense?

Note that you can do that yourself with adjusting the source code here.

Thanks alot!

Instead of no_split() try using split_none().

Here is my code:

data = (TabularList.from_df(df_train, path='.', cat_names=cat_names, cont_names=cont_names, procs=procs)
                    .label_from_df(cols = dep_var)
                    .add_test(test, label=0)