Introducing Convolutional Layer with a Twist

Crazy times. Picking up the discussion from Imagenette/ Imagewoof Leaderboards

I’d like to point out one problem that seems to happen a lot, if not always, and that is the training loss can stay well above validation loss. When I asked this, people say my training is not done right. But I don’t know what to do about that.

Here’s one run, size=128, epochs=80, lr=4e-3, mixup=0.5

data path   /root/.fastai/data/imagewoof2
Learn path /root/.fastai/data/imagewoof2
epoch	train_loss	valid_loss	accuracy	top_k_accuracy	time
0	2.161465	2.075368	0.333673	0.801985	01:49
1	1.962958	1.690776	0.466531	0.893357	01:49
2	1.873881	1.507369	0.564266	0.928735	01:48
3	1.776708	1.422561	0.602698	0.937134	01:49
4	1.692935	1.365374	0.629677	0.946806	01:48
5	1.627976	1.234447	0.697378	0.958514	01:50
6	1.584895	1.212052	0.705523	0.969967	01:50
7	1.551299	1.135982	0.740646	0.966149	01:49
8	1.511394	1.122625	0.745737	0.965386	01:50
9	1.484772	1.075874	0.766353	0.969203	01:50
10	1.474622	1.066350	0.772970	0.972767	01:50
11	1.456863	1.008994	0.796131	0.976839	01:51
12	1.420721	0.993462	0.804276	0.976584	01:51
13	1.395644	0.980982	0.804276	0.975821	01:50
14	1.376894	0.972562	0.804276	0.979384	01:48
15	1.348262	0.946901	0.825655	0.975057	01:49
16	1.354564	0.976460	0.808348	0.976839	01:48
17	1.348575	0.980201	0.803512	0.979639	01:47
18	1.340504	0.970298	0.809621	0.975057	01:48
19	1.311690	0.929736	0.831255	0.976330	01:48
20	1.291686	0.924482	0.836854	0.978112	01:48
21	1.306390	0.945454	0.828710	0.972003	01:48
22	1.275748	0.917969	0.833800	0.981166	01:49
23	1.278674	0.901997	0.841435	0.979893	01:49
24	1.276080	0.921529	0.826928	0.980148	01:49
25	1.268407	0.921693	0.832273	0.981420	01:49
26	1.238546	0.900284	0.838381	0.982438	01:49
27	1.241901	0.884923	0.851616	0.979893	01:50
28	1.232780	0.900848	0.838890	0.981929	01:50
29	1.222811	0.894986	0.841181	0.978875	01:49
30	1.211469	0.905154	0.839145	0.980911	01:48
31	1.221968	0.933286	0.834818	0.977348	01:49
32	1.232239	0.889966	0.851616	0.979639	01:50
33	1.218874	0.894085	0.852634	0.978112	01:49
34	1.196597	0.892226	0.847544	0.980911	01:50
35	1.188675	0.891138	0.849071	0.978366	01:49
36	1.183693	0.879083	0.849071	0.979384	01:49
37	1.173107	0.890478	0.847544	0.980402	01:48
38	1.171805	0.887006	0.850598	0.977602	01:48
39	1.171755	0.880086	0.859252	0.976839	01:47
40	1.182200	0.891929	0.835582	0.980911	01:47
41	1.172583	0.871285	0.862051	0.977857	01:46
42	1.167472	0.890622	0.846780	0.980402	01:47
43	1.152007	0.890563	0.843217	0.983202	01:47
44	1.161367	0.880082	0.855688	0.979893	01:48
45	1.136539	0.867136	0.855434	0.980148	01:49
46	1.162885	0.863992	0.856961	0.982438	01:49
47	1.150314	0.893648	0.849835	0.979639	01:47
48	1.150542	0.906185	0.840672	0.977602	01:48
49	1.132196	0.868540	0.858997	0.978366	01:49
50	1.138945	0.873602	0.859506	0.975566	01:49
51	1.141462	0.882345	0.849835	0.975821	01:49
52	1.123753	0.884568	0.846526	0.977348	01:49
53	1.131320	0.878340	0.850853	0.978875	01:50
54	1.102213	0.879064	0.854161	0.978112	01:51
55	1.121324	0.876642	0.855434	0.979893	01:50
56	1.129894	0.868407	0.846526	0.980657	01:49
57	1.103063	0.864340	0.858997	0.979130	01:49
58	1.137609	0.870569	0.849835	0.979639	01:49
59	1.094992	0.869274	0.860779	0.978621	01:49
60	1.099305	0.858406	0.862306	0.977093	01:49
61	1.123867	0.855571	0.864597	0.978621	01:48
62	1.081016	0.867963	0.860524	0.976075	01:48
63	1.079784	0.842200	0.862560	0.982184	01:49
64	1.093885	0.847684	0.860779	0.980148	01:50
65	1.064888	0.839558	0.868669	0.980911	01:50
66	1.076255	0.846076	0.864851	0.974294	01:50
67	1.061109	0.827110	0.874523	0.978621	01:50
68	1.054539	0.831806	0.870196	0.975821	01:50
69	1.040568	0.823592	0.871723	0.981166	01:49
70	1.049583	0.828491	0.870705	0.980148	01:49
71	1.030693	0.818572	0.874268	0.980911	01:49
72	1.041651	0.820911	0.876050	0.979130	01:49
73	1.015236	0.818820	0.876559	0.980911	01:49
74	1.022821	0.813191	0.877068	0.981675	01:49
75	1.029228	0.804716	0.880886	0.983202	01:50
76	1.019998	0.803970	0.879613	0.981675	01:52
77	1.021127	0.803941	0.883431	0.982184	01:52
78	1.025089	0.801169	0.881395	0.982184	01:51
79	1.011409	0.803271	0.881904	0.981675	01:50

I guess my question would be, for those who have done lots of training (from scratch), how often do you see train_loss > valid_loss? Is that a bad sign and how do you “fix” it?