I trained my model as following:
df = pd.read_csv('/content/gdrive/My Drive/atm.csv')
data = (tabular.TabularList.from_df(df, path='.', cat_names=cat_names, cont_names=cont_names, procs=[Categorify, Normalize])
.split_by_rand_pct(valid_pct = 0.1, seed = 88)
.label_from_df(cols=[dep_var])
.databunch())
learn = tabular_learner(data, layers=[2000,2000,500,200,50], metrics=exp_rmspe)
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(50, max_lr =1e-01,callbacks=[SaveModelCallback(learn,
monitor='valid_loss',
mode='min',
name='/content/gdrive/My Drive/atm88')])
and I got the result fairly acceptable, with train_loss=0.0245 and valid_loss=0.0199 at epoch 47:
41 0.027972 0.020318 0.161073 00:01 42 0.027237 0.022235 0.154306 00:01 43 0.026908 31822.730469 inf 00:01 44 0.024176 47407947776.000000 inf 00:01 45 0.024114 49439031296.000000 0.157786 00:01 46 0.023868 10416965.000000 inf 00:01 47 0.024527 0.019987 0.148315 00:01 48 0.023762 4771567.000000 0.148559 00:01 49 0.021999 121208012800.000000 0.144663 00:01 Better model found at epoch 0 with valid_loss value: 1.5367414425564742e+19.
Better model found at epoch 1 with valid_loss value: 0.09597836434841156.
Better model found at epoch 3 with valid_loss value: 0.026210423558950424.
Better model found at epoch 4 with valid_loss value: 0.021548109129071236.
Better model found at epoch 41 with valid_loss value: 0.0203182864934206.
Better model found at epoch 47 with valid_loss value: 0.0199868306517601.
Then I tried to validate:
learn.load('/content/gdrive/My Drive/atm88')
print("Validation:",learn.validate(learn.data.train_dl),learn.validate(learn.data.valid_dl))
I got the same valid_loss but not the same train_loss
Validation: [146669540000000.0, tensor(inf)] [0.01998683, tensor(0.1483)]
Why is the number for train_dl so high and how to reproduce the number 0.245 in the epoch 47?