So I prepare the same dataset using the following 2 methods but I am getting different results:
method 1
data1 = (
vision.ImageList.from_folder(path / "Training")
.split_by_rand_pct(seed=1995)
.label_from_folder()
.transform(size=size)
.databunch(bs=bs)
)
method 2
data2 = vision.ImageDataBunch.from_folder(
path,
train="Training",
valid_pct=0.2,
size=size, bs=bs
)
Method 1 give me the following data
ImageDataBunch;
Train: LabelList (672 items)
x: ImageList
Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227)
y: CategoryList
kinky,kinky,kinky,kinky,kinky
Path: ../data/hair/Training;
Valid: LabelList (168 items)
x: ImageList
Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227)
y: CategoryList
wavy,curly,curly,braids,kinky
Path: ../data/hair/Training;
Test: None
whereas, method 2 gives the following
ImageDataBunch;
Train: LabelList (840 items)
x: ImageList
Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227)
y: CategoryList
Testing,Testing,Testing,Testing,Testing
Path: ../data/hair;
Valid: LabelList (210 items)
x: ImageList
Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227),Image (3, 227, 227)
y: CategoryList
short-men,Testing,short-men,kinky,braids
Path: ../data/hair;
Test: None
Total training items are 840. Method 1 gives me a split of 80:20. Method 2 keeps the training data as is and adds a validation data with 20% of images. Why is this inconsistency present here? Is this the desired behaviour. To me method1 seems correct, but if there is any reason behind this then I’d like to know.