Where is the actual validation batch?

In mnist data, I am trying to run a CNN model but then I found out this portion of code

model.fit_generator(batches, batches.N, nb_epoch=14,
validation_data=test_batches, nb_val_samples=test_batches.N)

My question here is aren’t we supposed to pass validation sets to


ths parameter?

And also here in this code below

from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

there is no x_val and y_val. @jeremy told on one of video that our model during training should never see the test data. But then the fit_generator code above is seeing the test sets e.g y_test, x_test.
Can anyone please help me understand it ?

The terminology sometimes uses validation and test interchangeably which is confusing!

In this case:

  • Training is carried out on a training set. It does not look at the validation data.
  • Validation data is used to give you a score at the end of each epoch. It is not used in fitting to update weights.

The problem comes when you adjust hyperparameters such as learning rate based on the validation scores. By doing this you have incorporated validation data into your model and the scores are no longer independent. If you want an unbiased estimate of the final score you would need a separate test set.

1 Like

.[quote=“simoneva, post:2, topic:4549”]
If you want an unbiased estimate of the final score you would need a separate test set.

That means I should not use test set(x_test, y_test) in validation_data because I want a unbiased estimate of the final score which means I have to split the MNIST data into :slight_smile:

  1. Train --> x_train, y_train
  2. Test --> x_test, y_test
  3. Validation --> x_val, y_val

Just making sure I am on the right path.

Hi, I am confused with model.fit_generator as well, because i see other people just write code like :

model.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size),steps_per_epoch=iterations,epochs=epochs,callbacks=cbks,validation_data=(x_test, y_test))

(x_train, y_train), (x_test, y_test1) = cifar10.load_data()

Have you found out how to use validation_data in model.fit_generator ?

validation_data=(x_test, y_test))
Right there in x_test, y_test you are using test data as validation data.