For Keras fit method, does "shuffle=True" shuffle BOTH the training and validation samples or just the training dataset?

(WG) #1

Consider this piece of code:

lm.fit(train_data, train_labels, epochs=2, validation_data=(val_data, val_labels), shuffle=True)

When using fit_generator with batches, each individual batch can be created with shuffle=True or False separately. But when using fit(), you don’t get the option to shuffle or not shuffle the validation set independent of the training set.

So my question is, when setting shuffle=True above, is only the training data getting shuffled OR is the validation data set getting shuffled as well?

0 Likes

(Jeremy Howard (Admin)) #2

Just the training data.

1 Like

(WG) #3

Cool. Thanks for the reply!

0 Likes

#4

In the keras documentation, I didn’t see fit_generator has an option to shuffle. If it’s possible, can someone show an example how? Thanks.

0 Likes

(WG) #5

The shuffle argument is available when creating the batches using an image.ImageDataGenerator object.

See the documentation forflow() and flow_from_directory() here: https://keras.io/preprocessing/image/

0 Likes

(Eric Mulvihill) #6

Just so I can test my understanding, isn’t it irrelevant if the validation data is being shuffled, since a) it’s not adjusting any weights stochastically using the validation data, and b) the accuracy number should be the same regardless of the order the validation set is tested in? Is this correct?

1 Like

(WG) #7

Yes.

The validation set is just being used how well the trained model works on examples it hasn’t seen during training, and so it being shuffled is irrelevant.

2 Likes

#8

The validation data is used for optimizing parameters used for training though shuffling is irrelevant here.

0 Likes