I am wondering whether it makes sense to use a different kind of ensembles. So far we used different runs of one model and one training/validation set to build an ensemble.
What about using different training/validation sets to build the ensemble? When we use a single set, we loose the information in the validation set for training. So, my thought is to mitigate this by using different sets every time and average over the different sets!?
It is common to use cross-validation in sklearn where you would end up with a score for each fold and then average them. I am not sure why it is not used in deep learning but probably because of the time taken to train. 3 folds takes 3 times as long to train as a single train/valid split.