How to use sample subset of dataset with ImageClassifierData?

(Arka Nayan) #1

The code here in lesson2-image_models.ipynb seems to use the whole dataset by default.

def get_data(sz):
    tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_top_down, max_zoom=1.05)
    return ImageClassifierData.from_csv('', trainc_dir, label_csv, tfms=tfms, val_idxs=val_idxs, test_name=testc_dir)

data = get_data(sz)

learn = ConvLearner.pretrained(f_model, data, metrics=metrics)

How to use only a subset of data? I don’t seem to find anything relating that.

0 Likes

(Geoffrey) #2

Look my example:

np.random.seed(42)
data = (ImageDataBunch.from_folder(path=path, train=path_images,valid_pct=0.2)
.use_partial_data(0.1)
.split_by_rand_pct()
.label_from_folder()
.transform(tfms,size=128)
.databunch(bs=64)
.normalize(imagenet_stats))

I use .use_partial_data(0.1) to use only 10% of my datasets:
See also:
https://docs.fast.ai/data_block.html#ItemList.use_partial_data

0 Likes