Custom validation sets with the data bunch API

larcat · December 12, 2018, 2:56am

Could anyone post and example or three of doing non-standard validation sets with the data loader API? I’ve read the docs, and other than the standard % based splits I’m at a bit of a loss.

larcat · December 12, 2018, 2:33pm

Specifically:

I’m feeling pretty dumb here. Trying to put together my databunch in the following way:

data = (ImageItemList.from_csv(path, csv_name = 'train_munged_strat_sample.csv', col = 'Image', folder = './train/', 
                           suffix = '')            
    .label_from_df(col = 1) #Labels are here
    .split_from_df(col = 2) #True where it should be in validation set, false otherwise
    .transform(tfms, size=224)             
    .databunch()
    .normalize(imagenet_stats)
   )

Does not work.

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)
in
6 .label_from_df(col = 1)
7 .split_from_df(col = 2)
----> 8 .transform(tfms, size=224)
9 .databunch()
10 .normalize(imagenet_stats)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py in transform(self, tfms, **kwargs)
389 “Set tfms to be applied to the xs of the train and validation set.”
390 if not tfms: return self
–> 391 self.train.transform(tfms[0], **kwargs)
392 self.valid.transform(tfms[1], **kwargs)
393 if self.test: self.test.transform(tfms[1], **kwargs)

AttributeError: ‘ImageItemList’ object has no attribute ‘transform’

Any suggestions?

jeremy · December 12, 2018, 2:36pm

Many apologies - we are working on improving the exception reporting to make this more clear; the issue is that you need to split first, and then label. At some point hopefully we’ll figure out how to make this more flexible.

larcat · December 12, 2018, 2:48pm

Ah! Thank you! I doubt it would have occured to me to tweak the order of those.

Is this something someone who doesn’t understand the guts of fastai could help with, vis a vis PRs?

**The warnings/errors I mean, not trying to hack the underlying.