Hi, I’m trying to do a basic image classification on Fruits360 dataset on Kaggle. Since it has ImageNet structure I am using this:
path = 'fruits-360_dataset/fruits-360/'
path_train = path + 'Training'
path_test = path + 'Test'
data = ImageDataBunch.from_folder(path_train,
ds_tfms=get_transforms(do_flip=True, flip_vert=True),
valid_pct=0.2,
size=size,
bs=batch_size)
The issue is it is not splitting up data properly as you can see from the output of printing data
:
ImageDataBunch;
Train: LabelList (48399 items)
x: ImageList
Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100)
y: CategoryList
Orange,Orange,Orange,Orange,Orange
Path: fruits-360_dataset/fruits-360/Training;
Valid: LabelList (12099 items)
x: ImageList
Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100),Image (3, 100, 100)
y: CategoryList
Strawberry Wedge,Apple Golden 3,Salak,Tangelo,Pear Red
Path: fruits-360_dataset/fruits-360/Training;
This improper splitting leads to a validation accuracy of almost 100% which I believe is not correct. Why is it not splitting the data properly? Is there something which I’m doing wrong? Any help will be appreciated!
Thanks in advance
Lakshya