In the notebook “Iterate like a grandmaster”, Jeremy suggested for further improvement: “Before submitting a model, retrain it on the full dataset, rather than just the 75% training subset we’ve used here.” I wonder, how would I actually do that?
So far my understanding is that we train only on a subset (the training data) to be able to see how good we are doing on the validation set. We do this to be able to adjust our parameters in gradient descent. Maybe to ask the question a little differently: When training on 100% of the data, what would be the validation set?
Thanks for your thoughts on this topic
There would no longer be a validation set - you would train the model on the entirety of the dataset and submit it directly to Kaggle. If you are satisfied with the model’s performance when trained on a smaller portion of the data and validated on the rest, chances are, holding all else equal, it’d do even better once exposed to the full dataset. Alternatively, you may regard Kaggle’s public test set as your validation set, but beware that a high score on the public leaderboard does not necessarily translate into a high score on the private leaderboard.
I hope I’m not going off topic, I would like to know how to tell the
DataLoader not to use any validation set.
Are you using convenience methods such as
ImageDataLoaders.from_folder to construct your DataLoader? If so, you could set
valid_pct to 0.
I am using the
dset = DataBlock(blocks = (ImageBlock(), CategoryBlock),
dls = dset.dataloaders(df, bs=BS)
I have also tryied to set the entire dataframe’s column
is_valid to True, but it is not permitted.
You can simply omit the
splitter parameter in the
I tried to remove the
splitter , but it still takes a portion of the dataset for validation.
@BobMcDear Thanks for clarifying and also thanks to @Redevil for starting the discussion on how to actually implement using 100% of the training data in code.
I am currently working on lesson 4 and therefore I use the Hugging Face
Trainer (same as the
learner in Fast.AI). I could successfully show the model the full training set after a few epochs on “normal training rounds” by doing the following: (and it actually improved the result significantly)
#get the model which was trained before on 75% of the training data
model = trainer.model
#create a new test set since a test will be required by the trainer
dds = tok_ds.train_test_split(0.25, seed=seed)
# one more epoch with all the data
args = TrainingArguments('outputs', learning_rate=lr, warmup_ratio=0.1, lr_scheduler_type='cosine', fp16=True,
evaluation_strategy="epoch", per_device_train_batch_size=bs, per_device_eval_batch_size=bs*2,
num_train_epochs=epochs, weight_decay=0.01, report_to='none', gradient_accumulation_steps=1,
# pass the full dataset as training data and the test split as validation data
trainer = Trainer(model, args,
Question: Is this the intended way of training on the full training set, i.e. to first do the training “as usual” and then run another epoch with the full training data? Or would you suggest a different approach?
When you train the model with the full dataset, you have to “reset” the (weights of) previous trainer.
The strategy should be this:
- Tune the hyperparameters exploiting the validation set, i.e. the “partial” dataset.
- Use the previously calculated (step 1) hyperparams for training the “raw” model on the full dataset.
About the second step, you could try to decrease the learning rate parameter a bit
how would I actually do this in code (i.e. resetting the weights)? Would I go back to the pre-trained model I started with? Wouldn’t I loose the learnings from the first round of training? Which information gets carried over into the training session with the full dataset?
Sorry for asking so many questions…
You can create a new notebook, using the same network architecture and hyperparameters.
Yes, but you are going to re-train the network with the same hyperparameters, therefore the network learns the same “information” from the former training set, plus the “information” coming from the former validation set.
No informations are carried over, it happens in transfer-learning and finetuning instead.
** You can use the Fastai’s
set_seed() function in order to obtain results reproducibility: same network, with the same hyperparams, will gives you always the same results.