Overfitting?

Am I overfitting? Here are my training results; the accuracy seems to decrease and then increase again somewhat spontaneously.

epoch train_loss valid_loss accuracy time
0 0.541770 0.614922 0.622222 00:00
1 0.489105 0.469179 0.822222 00:00
2 0.455800 0.383595 0.888889 00:00
3 0.444867 0.368834 0.888889 00:00
4 0.441680 0.359972 0.844444 00:00
5 0.435412 0.341931 0.877778 00:00
6 0.426038 0.346072 0.877778 00:00
7 0.416072 0.337947 0.866667 00:00
8 0.407506 0.344396 0.866667 00:00
9 0.403504 0.368717 0.877778 00:00
10 0.397069 0.343857 0.877778 00:00
11 0.393713 0.353145 0.855556 00:00
12 0.388475 0.353242 0.855556 00:00
13 0.386334 0.401245 0.800000 00:00
14 0.383441 0.359856 0.855556 00:00
15 0.383383 0.359822 0.844444 00:00
16 0.379185 0.367299 0.822222 00:00
17 0.374292 0.387453 0.844444 00:00
18 0.370405 0.334737 0.855556 00:00
19 0.365592 0.350353 0.855556 00:00
20 0.364632 0.352333 0.877778 00:00
21 0.359787 0.407807 0.811111 00:00
22 0.357776 0.374265 0.866667 00:00
23 0.360798 0.335189 0.866667 00:00
24 0.355599 0.361366 0.822222 00:00
25 0.355366 0.374440 0.855556 00:00
26 0.352372 0.379995 0.866667 00:00

I’m not quite sure if you can call this as over fitting since the validation loss isn’t consistently increasing with each epoch. I suppose increasing the number of epochs 2x should give you a better picture of what’s happening?

If you’re using image classification, try adding the argument to cnn_learner

callback_fns = ShowGraph

to plot training and validation losses against the number of iterations. Interpreting a graph is much simpler than a table.

Doubling the number of epochs does increase the validation loss overall. I’m using a tabular_learner with layers=[200, 100]. Can a tabular learner be graphically analyzed?

epoch train_loss valid_loss accuracy time
0 0.594496 0.560617 0.744444 00:00
1 0.510322 0.380074 0.888889 00:00
2 0.483441 0.373966 0.844444 00:00
3 0.463078 0.436973 0.788889 00:00
4 0.444301 0.376622 0.844444 00:00
5 0.437093 0.374871 0.844444 00:00
6 0.427499 0.335551 0.900000 00:00
7 0.421538 0.358031 0.844444 00:00
8 0.409195 0.325067 0.888889 00:00
9 0.404497 0.356501 0.844444 00:00
10 0.398436 0.348111 0.855556 00:00
11 0.393160 0.367109 0.844444 00:00
12 0.391756 0.422738 0.811111 00:00
13 0.389445 0.366013 0.855556 00:00
14 0.386014 0.363904 0.833333 00:00
15 0.379466 0.342542 0.866667 00:00
16 0.373142 0.342396 0.877778 00:00
17 0.369117 0.350198 0.833333 00:00
18 0.363563 0.364488 0.844444 00:00
19 0.361719 0.398087 0.844444 00:00
20 0.359476 0.410388 0.822222 00:00
21 0.358391 0.350455 0.855556 00:00
22 0.354601 0.388731 0.822222 00:00
23 0.352590 0.371988 0.822222 00:00
24 0.349094 0.443939 0.811111 00:00
25 0.350024 0.380391 0.844444 00:00
26 0.346449 0.409832 0.833333 00:00
27 0.341009 0.422728 0.822222 00:00
28 0.342597 0.340252 0.822222 00:00
29 0.339622 0.459351 0.833333 00:00
30 0.339359 0.421456 0.822222 00:00
31 0.339204 0.416893 0.877778 00:00
32 0.335463 0.410357 0.788889 00:00
33 0.329928 0.477867 0.800000 00:00
34 0.330816 0.410211 0.833333 00:00
35 0.330234 0.433103 0.788889 00:00
36 0.328858 0.430457 0.777778 00:00
37 0.325450 0.413197 0.800000 00:00
38 0.325488 0.463287 0.777778 00:00
39 0.323106 0.447742 0.822222 00:00
40 0.316804 0.532856 0.788889 00:00
41 0.313581 0.450557 0.777778 00:00
42 0.314789 0.488426 0.777778 00:00
43 0.318153 0.455692 0.811111 00:00
44 0.314271 0.473768 0.811111 00:00
45 0.315818 0.528367 0.800000 00:00
46 0.316584 0.450639 0.833333 00:00
47 0.322711 0.414061 0.822222 00:00
48 0.319972 0.401177 0.822222 00:00
49 0.317761 0.545660 0.766667 00:00
50 0.319913 0.464935 0.800000 00:00
51 0.316517 0.436917 0.788889 00:00
52 0.316660 0.450867 0.800000 00:00
53 0.317192 0.464036 0.833333 00:00

Yes you can plot results with Scikitlearn.

Let me know if this helps.

Hi, are you trying out the Titanic dataset from kaggle?
I couldn’t manage >88% accuracy. It seems you’ll need to do some feature engineering.
Also you can try changing up [200,100] values.

If I understand correctly, you are not overfitting. The plane around the point you’re trying to converge on is ragged, so it is throwing you off- and less learning rate has high loss in this particular problem.

In my experiments, that’s exactly now these parameters (losses, accuracy and etc) bahave in tabular models.
To my mind the model have already fit at around 5-7 cycle number. The rest looks just like fluctuations (in terms of validation set error/accuracy) and overfit.
Unfortunately, I did not came to the general answer in these situations (what to do next),
You can try to add more dropout it may help. Feature engineering also could.
But according to accuracy values, I think your set is not very big. And maybe it just not big enough to move further with this data.

I am! That makes sense. I actually already did some feature engineering (e.g. extracting titles from names, replacing with 0 for no cabin and 1 for cabin). I will try changing layers values, though there doesn’t seem to be much documentation on that.

The dataset is indeed very small. I will try adjusting dropout and see if the model works better on the test set with fewer epochs.

@jimsu2012 If i remember correctly, there is a correlation between cabin location (C,E and so forth) and survival. It just shows how far people went with feature engineering :smile:
I think there are some kernels on the dataset explaining feature importance and important pre-processing before chucking it into random forests. These might be helpful.

The dataset is indeed small, oversampling might be worth looking into.