I haven’t tested it with anything else, but this callback isn’t working when trying to fit a collab learner. The callback is initialized on training start, but none of the methods are called and the model isn’t saved nor the best loaded at training end. Any ideas on why?
How and when are you setting up the callback? On Learner or on fit
On fit:
def collab_train_eval(dls, use_nn, y_range, layers, epochs, max_lr, wd):
learn = collab_learner(dls, use_nn = use_nn, y_range = y_range, layers = layers, metrics = [rmse, mae])
learn.fit_one_cycle(epochs, max_lr, wd = wd, callbacks = [
SaveModelCallback(monitor = 'valid_loss',
fname = 'optim/temp_best_collab')
])
learn.fit_one_cycle(epochs, max_lr, wd)
return learn.validate()
Also, slightly unrelated but I know you’ve been working a lot with tabular data – dropout doesn’t work on the tabular_learner
in fastai2
. I tried submitting a PR but I was being blocked and I haven’t gotten around to going through the process of doing it, but figured I’d let you know in case you’ve seen any weird behaviour
Hmm… odd. I’ll take a look at both this weekend
Hi everyone hope all is well and your having a jolly weekend!
I am doing lesson 8 on google colab and the notebook is failing at this point, when finetuing the model.
Its not completing the 10 epochs.
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)
80.00% [8/10 8:02:46<2:00:41]
epoch train_loss valid_loss accuracy perplexity time
0 3.893891 3.804772 0.313039 44.915024 1:00:20
1 3.864590 3.758198 0.318095 42.871124 1:00:13
.......
7 3.567819 3.620880 0.334902 37.370434 1:00:20
37.50% [1972/5258 21:36<36:00 3.5383]
This has failed about six times now!!
Things tried so far
- reduced the batch size to 64 instead of failing after 2-3 epochs, it’s now failing after 6 - epochs but each epoch takes longer.
- created a keep alive as it fails if my screen saver comes on for longer than 15-30m?
- I will try reducing the batch size to 32.
- I have tried creating a simple callback with with the following.
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3, callbacks=[SaveModelCallback(learn, every='epoch')])
This gives the following error:
TypeError: argument of type 'LMLearner' is not iterable```
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3, callbacks=[SaveModelCallback(learn, every='epoch', monitor = 'accuracy')])
This gives an error also.
Error __init__() got multiple values for argument 'monitor'
Q. what is correct way to configure the simplest SaveModelCallback for this NLP model?
Any information greatly appreciated.
Cheers mrfabulous1
Did you ever a chance to look into this?
I’m afraid I did not, I’d open an issue with a reproducer on the github
Okay no worries – will do