Huge valid_loss but OK prediction

I use following code to train with my tabular data:

data=load_data('/content/gdrive/My Drive','dbs')
learn = tabular_learner(data, layers=[2000,2000,500,50], metrics=rmse)
learn.lr_find()
learn.fit_one_cycle(15, max_lr =1e-01,callbacks=[SaveModelCallback(learn,
monitor='valid_loss',
mode='min',
name='/content/gdrive/My Drive/mark2000,2000,500,50')])
learn.save('/content/gdrive/My Drive/mark2000')

The valid_loss from the result doesn’t seem very good:

epoch 	train_loss 	valid_loss 	root_mean_squared_error 	time
0 	0.000343 	2.165349 	0.200998 	12:10
1 	0.000603 	3869867238521569280.000000 	263581904.000000 	12:57
2 	0.000515 	1389359.125000 	213.215988 	22:19
3 	0.007825 	138927829654962176.000000 	90932920.000000 	16:04
4 	0.000724 	7215175153221632.000000 	12987230.000000 	14:43
5 	0.027466 	32911909760056376164352.000000 	15733768192.000000 	14:37
6 	0.000694 	146761263874048.000000 	2563498.250000 	12:29
7 	0.000482 	648323334144.000000 	191592.218750 	13:01
8 	0.000693 	745348555828363264.000000 	278343136.000000 	14:23
9 	0.000420 	41489119838208.000000 	1567249.250000 	19:25
10 	0.000327 	1750879657000960.000000 	11598753.000000 	23:08
11 	0.000243 	60105641984.000000 	44858.433594 	28:48
12 	0.000206 	339650707456.000000 	102419.742188 	35:01
13 	0.000169 	1999799680.000000 	11372.961914 	46:04
14 	0.000144 	990617536.000000 	4973.895020 	47:58

Better model found at epoch 0 with valid_loss value: 2.1653494834899902.

but the numbers from the show_results are not so bad. This the epoch0 with valid_loss of 2.165:

learn.load('/content/gdrive/My Drive/mark2000,2000,500,50.0')
learn.show_results(rows=10)
|target|prediction|
|0.05685259|[0.051896]|
|0.08022369|[0.070863]|
|0.053750124|[0.041035]|
|0.06384782|[0.054906]|
|0.048151325|[0.036433]|
|0.07299445|[0.0811]|
|0.0665585|[0.054433]|
|0.051318087|[0.050075]|
|0.08468162|[0.068608]|
|0.090876736|[0.11432]|

And this is the last one with valid_loss of 990M:

learn.load('/content/gdrive/My Drive/mark2000')
learn.show_results(rows=10)
|target|prediction|
|0.05685259|[0.054024]|
|0.08022369|[0.073945]|
|0.053750124|[0.052672]|
|0.06384782|[0.060849]|
|0.048151325|[0.049834]|
|0.07299445|[0.087745]|
|0.0665585|[0.065725]|
|0.051318087|[0.057663]|
|0.08468162|[0.079118]|
|0.090876736|[0.100845]|

The first 10 rows shows the latter looks even better. How can I see where the huge valid_loss from? And advice on how to improve further?

What is the data set that you have used?
Maybe the model is over-fitting and also try with test data.

There could be a few issues causing this, though I do not believe you have the correct values for “the last one with a validation loss of 990M.”
https://docs.fast.ai/callbacks.tracker.html#SaveModelCallback
" Loads the best model at the end of training is every='improvement' ." - so it looks like you are loading your first weights at the end of training, then saving them again.

To get validation error that high though, I would think that you have a issue with how your data is setup, and something odd is happening when it goes to determine the validation loss.

Just looking at your data it seems to range from 0-1 so I am not exactly sure how your loss could be so high.

For your data in load_data(’/content/gdrive/My Drive’,‘dbs’), could you give us a small sample of it?

Also for this type of problem it might be better to use 1-2 epochs and regular fit().

Also, what steps did you take to setup the data? This may be a normalization issue.

Thanks for taking the look.
I can upload the data somewhere if you like.
I checked the validation result. Although most of the result are close, some can be very far away:

pred        act         diff
0.084767386 0.088255011 -0.003487624
0.068978094 0.075387105 -0.006409012
502.8942566 0.303940237 502.5903164
0.064133182 0.076282002 -0.01214882
0.042843096 0.040072933 0.002770163

I split_by_rand_pct the dataset with different seeds and trained them to get different models. When one model gives me extreme outlier, the other usually gives reasonable predictions. I use such pair to get fairly acceptable result for now. Is such practice legal?

I think the data would be good at this point, as well as code used to process the data. I am particularly wondering about the fact that the actual with the highest value was the one with the weirdest pred.

I’m trying to get the validation set to check what input caused those outliers. But how to do so? I can use learn.data.valid_ds[5] to print the data. Is there anyway to export them into a csv file? It would be even better to see the original data before the process (esp normalization)