I’m trying to train a tabular learner in Databricks, and I am running into issues using the fit_one_cycle
method for training.
If I specify only one training epoch, the training will usually complete. If I specify more than one epoch, training will hang with no error given. Sometimes the training will hang on the first epoch.
I’m running my code on a Standard NC12 worker with the 5.4 ML runtime.
Any help would be much appreciated!
It’s hard to help without seeing any code
@sgugger, @Andrew_Fowler
Same problem I have experienced on text classifier learner after unfreezing to some number as the below lines of code. This issue mostly arises when using OverSamplingCallback(learn).
learn.freeze_to(-2)
learn.fit_one_cycle(8, lr, callbacks=[SaveModelCallback(learn), OverSamplingCallback(learn),
ReduceLROnPlateauCallback(learn, factor=0.8)])
PS: Training has been done on v100.