Model training in Databricks hangs

(Andrew Fowler) #1

I’m trying to train a tabular learner in Databricks, and I am running into issues using the fit_one_cycle method for training.

If I specify only one training epoch, the training will usually complete. If I specify more than one epoch, training will hang with no error given. Sometimes the training will hang on the first epoch.

I’m running my code on a Standard NC12 worker with the 5.4 ML runtime.

Any help would be much appreciated!



It’s hard to help without seeing any code :wink: