Chapter 4 Further Research: MNIST Full


I am working on the further research part of chapter 4 to build the full MNIST classifier without using the fastai ready to use functions. However even after training the model for 60 epochs, my accuracy doesn’t increase more that 10% (0.10).
Here is my code. Looking for some help/perspective on what I am missing here. Thanks in advance!

# function to calculate loss
def mnist_loss(pred, actual):
    l = nn.CrossEntropyLoss()
    return l(pred, actual.squeeze())

# function to calculate gradient
def calc_grad(xb, yb, model):
    pred = model(xb)
    loss = mnist_loss(pred, yb)
    return loss

# function to define accuracy
def batch_accuracy(pred, actual):
    digit_pred = pred.max(dim=1)[1]
    return (digit_pred==actual).float().mean()

#function to train 1 epoch and print average batch loss
def train_epoch(model):
    batch_loss = []
    for xb,yb in train_dl:
        batch_loss.append(calc_grad(xb, yb, model))
    print('Average batch loss: ', tensor(batch_loss).mean())

class BasicOptim:
    def __init__(self,params,lr): self.params, = list(params),lr

    def step(self, *args, **kwargs):
        for p in self.params: -= *

    def zero_grad(self, *args, **kwargs):
        for p in self.params: p.grad = None

simple_net = nn.Sequential(

opt = BasicOptim(simple_net.parameters(), lr=0.04)

def train_model(model,epochs):
    for i in range(epochs):
        print('epoch', i, ': ', batch_accuracy(model(valid_x),valid_y))

train_model(simple_net, 60)

Note: valid_x, valid_y are the validation tensors. train_dl is the training DataLoader with batch_size 64.

1 Like

Hello Priya and welcome to the forums.

I do not see anything wrong with your code at first glance. But if you will post a complete working example that shows the issue, I will run it and see what might be happening. In particular, how do you construct train_dl, valid_x, and valid_y?


Hi Malcolm, thanks for your response!

I have uploaded my notebook here:

It contains the entire code including how I am creating the training and validation sets. Also, shows the result of training for 60 epochs where accuracy doesn’t improve much.

I have experimented with increasing the learning rate a bit (upto 0.3) thinking that small learning rate might be leading to slower improvement in accuracy, but that doesn’t help much and the problem remains.

Your insights would be helpful.

It seems the problem is with the batch_accuracy metric used.
Here I compare the batch_accuracy result with accuracy (in-built fastai metric) after 50 epochs with lr=1e-3. Your model is doing good! :slight_smile:

# a 
# tensor(0.1003)

# TensorBase(0.9582)

After going through the shape of your actual tensor,

def batch_accuracy(pred, actual):
    digit_pred = pred.max(dim=1)[1]
    actual = actual.squeeze() # squeeze the tensor
    return (digit_pred==actual).float().mean()

I see that you just need to add the line actual = actual.squeeze() in your batch_accuracy function before returning the score to fix this bug.

All the best!

1 Like

Thanks! Yes, changing the shape of ‘actual’ tensor resolves the bug.

For anyone looking for a basic working code for MNIST full digit classifier without using the fastai leaner class, I have uploaded my notebook here: