Hello, I am currently going through this series of lectures along with the textbook. I was working on Q2 of the “Further Research” section in chapter 4 of the textbook (training a neural network on the full MNIST dataset). I am currently in a situation where the loss on both the training set and the validation set is decreasing, each epoch, but the accuracy more or less stays the same at a very low number around 0.1.
I have consulted other solutions posted by members of this forum, but not sure where my solution is off. I have posted my code along with comments here. I would appreciate if anyone could point out what the problem is / why it is occurring. Thanks!
Looks like the version of the code I posted does not include train_loss
and valid_loss
(I was previously using the built in Learner
class which showed those two values).
I will paste the result I get when I run my code with the built in Learner
:
epoch train_loss valid_loss batch_accuracy time
0 0.515939 0.512917 0.108500 00:00
1 0.507944 0.503949 0.103000 00:00
2 0.498174 0.493175 0.103200 00:00
3 0.486648 0.480698 0.103900 00:00
4 0.473775 0.466921 0.105300 00:00
5 0.459645 0.451923 0.107300 00:00
6 0.444157 0.435620 0.115000 00:00
7 0.427215 0.417961 0.122200 00:00
8 0.408822 0.399008 0.132600 00:00
9 0.389109 0.378947 0.138200 00:00
10 0.368345 0.358084 0.139700 00:00
11 0.346908 0.336823 0.137100 00:00
12 0.325258 0.315624 0.131600 00:00
13 0.303888 0.294958 0.127600 00:00
14 0.283283 0.275261 0.125100 00:00
15 0.263865 0.256887 0.124300 00:00
16 0.245953 0.240079 0.122100 00:00
17 0.229740 0.224962 0.121200 00:00
18 0.215295 0.211551 0.121400 00:00
19 0.202581 0.199772 0.121700 00:00
20 0.191487 0.189498 0.121800 00:00
21 0.181859 0.180570 0.123500 00:00
22 0.173525 0.172825 0.124800 00:00
23 0.166315 0.166103 0.126400 00:00
24 0.160072 0.160257 0.126600 00:00
25 0.154652 0.155161 0.127100 00:00
26 0.149933 0.150701 0.128400 00:00
27 0.145808 0.146785 0.130100 00:00
28 0.142190 0.143331 0.130800 00:00
29 0.139001 0.140273 0.130600 00:00
The issue was my choice of loss function (l1 norm with sigmoid). Changing the loss function to nn.CrossEntropyLoss()
solved the issue.
Based on my intuition, I did expect the the model to train with the l1 norm + sigmoid, albeit not as efficiently as cross entropy loss. However I noticed the accuracy does not increase at all… if someone knows the intuition behind this, would be happy to learn about it.