Training Loss Starts Lower Than Validation Loss

krehbiel21 · April 18, 2018, 11:38pm

Not sure what is going on. The first time I run .fit() it immediately displays the training loss as lower than the validation loss, no matter what numbers I put for the parameters. Any idea why this could be? I understand this means I’m overfitting, but how is it happening the first time I train, and on the first iteration?

I’ve tried drastically increasing the dropout rates to no avail.

A few lines of code that may be useful:

m = md.get_learner(emb_szs, len(df.columns) - len(cat_vars), 0.04, 1, [500,250], [0.04,0.4],
y_range=y_range, use_bn=True)
lr=1e-3; wd=1e-7
m.fit(lr, 2, wd, cycle_len=1, cycle_mult=2)

Any information would be very helpful!

krehbiel21 · April 19, 2018, 2:06am

@jeremy

machinethink · April 19, 2018, 8:07am

I’m not sure why you think that means you’re overfitting. Unless the difference is huge there’s no problem if the training loss is smaller than the validation loss. There is only a problem if your validation loss starts to go up over time rather than down.

krehbiel21 · April 19, 2018, 6:19pm

The validation loss starts going up after the second Epoch. All signs point to overfitting right away. I never got a similar example to this anywhere in the course.

machinethink · April 20, 2018, 7:49am

How much data do you have?

krehbiel21 · April 20, 2018, 3:20pm

About 40,000 rows. Was thinking that could be the issue, but I remember there being a portion of the course where we trained a model on less data than that IIRC.

machinethink · April 20, 2018, 8:50pm

How many columns in those rows? I don’t know enough about this model you’re using but perhaps it uses way too many parameters.

krehbiel21 · April 21, 2018, 2:43am

Around 30 total features. I didn’t realize more features would cause over fitting more easily, would’ve thought it would be the other way around. Thanks for the help btw!

machinethink · April 21, 2018, 11:05am

I’d try lowering the size of the hidden layers (it looks like 500 and 250 neurons right now?) as well as the learning rate, just to see what happens.

fabsta · May 15, 2018, 2:30pm

Did you manage to reduce overfitting?