Crazy times. Picking up the discussion from Imagenette/ Imagewoof Leaderboards
I’d like to point out one problem that seems to happen a lot, if not always, and that is the training loss can stay well above validation loss. When I asked this, people say my training is not done right. But I don’t know what to do about that.
Here’s one run, size=128
, epochs=80
, lr=4e-3
, mixup=0.5
data path /root/.fastai/data/imagewoof2
Learn path /root/.fastai/data/imagewoof2
epoch train_loss valid_loss accuracy top_k_accuracy time
0 2.161465 2.075368 0.333673 0.801985 01:49
1 1.962958 1.690776 0.466531 0.893357 01:49
2 1.873881 1.507369 0.564266 0.928735 01:48
3 1.776708 1.422561 0.602698 0.937134 01:49
4 1.692935 1.365374 0.629677 0.946806 01:48
5 1.627976 1.234447 0.697378 0.958514 01:50
6 1.584895 1.212052 0.705523 0.969967 01:50
7 1.551299 1.135982 0.740646 0.966149 01:49
8 1.511394 1.122625 0.745737 0.965386 01:50
9 1.484772 1.075874 0.766353 0.969203 01:50
10 1.474622 1.066350 0.772970 0.972767 01:50
11 1.456863 1.008994 0.796131 0.976839 01:51
12 1.420721 0.993462 0.804276 0.976584 01:51
13 1.395644 0.980982 0.804276 0.975821 01:50
14 1.376894 0.972562 0.804276 0.979384 01:48
15 1.348262 0.946901 0.825655 0.975057 01:49
16 1.354564 0.976460 0.808348 0.976839 01:48
17 1.348575 0.980201 0.803512 0.979639 01:47
18 1.340504 0.970298 0.809621 0.975057 01:48
19 1.311690 0.929736 0.831255 0.976330 01:48
20 1.291686 0.924482 0.836854 0.978112 01:48
21 1.306390 0.945454 0.828710 0.972003 01:48
22 1.275748 0.917969 0.833800 0.981166 01:49
23 1.278674 0.901997 0.841435 0.979893 01:49
24 1.276080 0.921529 0.826928 0.980148 01:49
25 1.268407 0.921693 0.832273 0.981420 01:49
26 1.238546 0.900284 0.838381 0.982438 01:49
27 1.241901 0.884923 0.851616 0.979893 01:50
28 1.232780 0.900848 0.838890 0.981929 01:50
29 1.222811 0.894986 0.841181 0.978875 01:49
30 1.211469 0.905154 0.839145 0.980911 01:48
31 1.221968 0.933286 0.834818 0.977348 01:49
32 1.232239 0.889966 0.851616 0.979639 01:50
33 1.218874 0.894085 0.852634 0.978112 01:49
34 1.196597 0.892226 0.847544 0.980911 01:50
35 1.188675 0.891138 0.849071 0.978366 01:49
36 1.183693 0.879083 0.849071 0.979384 01:49
37 1.173107 0.890478 0.847544 0.980402 01:48
38 1.171805 0.887006 0.850598 0.977602 01:48
39 1.171755 0.880086 0.859252 0.976839 01:47
40 1.182200 0.891929 0.835582 0.980911 01:47
41 1.172583 0.871285 0.862051 0.977857 01:46
42 1.167472 0.890622 0.846780 0.980402 01:47
43 1.152007 0.890563 0.843217 0.983202 01:47
44 1.161367 0.880082 0.855688 0.979893 01:48
45 1.136539 0.867136 0.855434 0.980148 01:49
46 1.162885 0.863992 0.856961 0.982438 01:49
47 1.150314 0.893648 0.849835 0.979639 01:47
48 1.150542 0.906185 0.840672 0.977602 01:48
49 1.132196 0.868540 0.858997 0.978366 01:49
50 1.138945 0.873602 0.859506 0.975566 01:49
51 1.141462 0.882345 0.849835 0.975821 01:49
52 1.123753 0.884568 0.846526 0.977348 01:49
53 1.131320 0.878340 0.850853 0.978875 01:50
54 1.102213 0.879064 0.854161 0.978112 01:51
55 1.121324 0.876642 0.855434 0.979893 01:50
56 1.129894 0.868407 0.846526 0.980657 01:49
57 1.103063 0.864340 0.858997 0.979130 01:49
58 1.137609 0.870569 0.849835 0.979639 01:49
59 1.094992 0.869274 0.860779 0.978621 01:49
60 1.099305 0.858406 0.862306 0.977093 01:49
61 1.123867 0.855571 0.864597 0.978621 01:48
62 1.081016 0.867963 0.860524 0.976075 01:48
63 1.079784 0.842200 0.862560 0.982184 01:49
64 1.093885 0.847684 0.860779 0.980148 01:50
65 1.064888 0.839558 0.868669 0.980911 01:50
66 1.076255 0.846076 0.864851 0.974294 01:50
67 1.061109 0.827110 0.874523 0.978621 01:50
68 1.054539 0.831806 0.870196 0.975821 01:50
69 1.040568 0.823592 0.871723 0.981166 01:49
70 1.049583 0.828491 0.870705 0.980148 01:49
71 1.030693 0.818572 0.874268 0.980911 01:49
72 1.041651 0.820911 0.876050 0.979130 01:49
73 1.015236 0.818820 0.876559 0.980911 01:49
74 1.022821 0.813191 0.877068 0.981675 01:49
75 1.029228 0.804716 0.880886 0.983202 01:50
76 1.019998 0.803970 0.879613 0.981675 01:52
77 1.021127 0.803941 0.883431 0.982184 01:52
78 1.025089 0.801169 0.881395 0.982184 01:51
79 1.011409 0.803271 0.881904 0.981675 01:50
I guess my question would be, for those who have done lots of training (from scratch), how often do you see train_loss > valid_loss? Is that a bad sign and how do you “fix” it?