So far, when tuning my hyperparameters (learning rate, batch size etc), I have been using fit one cycle, training for 3 epochs, and observing my metrics, to determine what might be the best hyperparameters for my models.
Is 3 epochs enough? Would using 1 epoch be sufficient to find the best hyperparameters?
When evaluating, I’ve been using a baseline as suggested and comparing other experiments against that.
I was hoping though 1 or a low number of epochs would be enough to predict the likely model performance if running for more epochs with the chosen hyperparamaters. But sounds like I will need to watch for a statistically significant change to be certain.
No problem It also depends on the problem. EG image classification 5 would be enough depending (look at the ImageWoof experiments) but if it’s tabular it could need less (2-3)