I’m finalizing my submission for my first real ML competition,
and had a question - apologies if this is too elementary.
The competition’s task is regression on images (specifically
2D xrays) The training set is pretty small (about 700
images with relatively few positive examples),
and there’s no separate validation set given,
so to tune the hyperparameters I’ve been using 5-fold
cross validation on the training set data.
My thinking has been that after finding the best
hyperparameter settings I would then train the final
models for the submission on all the training data
(since the training set is so limited).
Is this a good idea, or should I stick with using
a model trained with a validation set split off
so I can see how the model performed on
the validation data?