For Keras fit method, does "shuffle=True" shuffle BOTH the training and validation samples or just the training dataset?

wgpubs · May 11, 2017, 3:01am

Consider this piece of code:

lm.fit(train_data, train_labels, epochs=2, validation_data=(val_data, val_labels), shuffle=True)

When using fit_generator with batches, each individual batch can be created with shuffle=True or False separately. But when using fit(), you don’t get the option to shuffle or not shuffle the validation set independent of the training set.

So my question is, when setting shuffle=True above, is only the training data getting shuffled OR is the validation data set getting shuffled as well?

jeremy · May 11, 2017, 3:40am

Just the training data.

wgpubs · May 11, 2017, 3:50am

Cool. Thanks for the reply!

libphy · May 17, 2017, 3:43am

In the keras documentation, I didn’t see fit_generator has an option to shuffle. If it’s possible, can someone show an example how? Thanks.

wgpubs · May 17, 2017, 4:36pm

The shuffle argument is available when creating the batches using an image.ImageDataGenerator object.

See the documentation forflow() and flow_from_directory() here: https://keras.io/preprocessing/image/

ericm · May 18, 2017, 3:48pm

Just so I can test my understanding, isn’t it irrelevant if the validation data is being shuffled, since a) it’s not adjusting any weights stochastically using the validation data, and b) the accuracy number should be the same regardless of the order the validation set is tested in? Is this correct?

wgpubs · May 18, 2017, 7:41pm

Yes.

The validation set is just being used how well the trained model works on examples it hasn’t seen during training, and so it being shuffled is irrelevant.

tejasvi88 · June 23, 2019, 5:24am

The validation data is used for optimizing parameters used for training though shuffling is irrelevant here.

eilalan · October 21, 2020, 5:56am

Hi, thanks for the post.
If I split my data to train, validation and test. train and validation are used for training where validation is a specific dataset (not cross validated).
Test is used for model performance evaluation.

Do you mean that the shuffle should be done on both training and validation set?

Thanks,
eilalan

joshiharshit5077 · May 25, 2022, 1:11pm

How can we use shuffle =true method in fastai learn.fit and learner.fit_one_cycle?
All suggestions are welcome.
Thanking you,
Harshit