I am having a weird issue. When I am getting the training and validation images using get_batches images, the output is what I expect and is given below. There are, as expected 22980 images (11490 per class) in the train directory, and 2000 images (1000 images per class) in the valid directory.
Found 22980 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
However after calling get_data on the train_batches and val_batches, the shape of the train_data and val_data is not what I expect and is given below. As you can see there are only 22976, and 1984 images in train_data and valid_data respectively. Any idea what’s going wrong? I am using get_data to to use processed arrays instead of processing the images from disk every time using bcolz.
(22976, 3, 224, 224)
(1984, 3, 224, 224)
I have included the relevant code below for more details.
def get_batches(dirname, gen=image.ImageDataGenerator(), shuffle=True,
batch_size=4, class_mode='categorical'):
return gen.flow_from_directory(dirname, target_size=(224,224),
class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)
DATA_DIR = "data/dogs-vs-cats-redux-kernels-edition/"
batch_size = 64
train_batches = get_batches(DATA_DIR + 'train', batch_size = batch_size)
val_batches = get_batches(DATA_DIR + 'valid', batch_size = batch_size)
val_data = get_data(val_batches)
train_data = get_data(train_batches)
print(train_data.shape) ## Prints (22976, 3, 224, 224)
print(val_data.shape) ## Prints (1984, 3, 224, 224)
val_classes = val_batches.classes
train_classes = train_batches.classes
val_labels = onehot(val_classes)
train_labels = onehot(train_classes)
print(train_labels.shape) ## Prints (22980, 2)
print(val_labels.shape) ## Prints (2000, 2)