Chapter 4: Custom MNIST net implementation

sayanarijit · December 22, 2022, 7:13am

Hi, I have been trying to implement a custom net from the lessons learned from chapter 4.

class LinearModel:
    """A simple linear model."""
    def __init__(self, in_features, out_features):
        self.weights = init_params((in_features, out_features))  # torch.Size([in_features, out_features])
        self.bias = init_params(out_features)  # torch.Size([out_features])

    def parameters(self):
        return (self.weights, self.bias)

    def __call__(self, xb):
        return (xb @ self.weights) + self.bias

class SimpleNet:
    """A simple multi layer neural network."""
    def __init__(self, in_features, out_features):
        self.layer1 = LinearModel(in_features, 30)
        self.layer2 = lambda xb: xb.max(tensor(0.0))
        self.layer3 = LinearModel(30, out_features)

    def parameters(self):
        w1, b1 = self.layer1.parameters()
        w2, b2 = self.layer3.parameters()
        return (w1, b1, w2, b2)

    def __call__(self, xb):
        res = self.layer1(xb)
        res = self.layer2(res)
        res = self.layer3(res)
        return res

model = SimpleNet(28*28, 1)

But while LinearModel works fine with BasicOptim, SimpleNet doesn’t.


learner = SimpleLearner(data_loaders, LinearModel(28*28, 1))
learner.train_model(20, learning_rate=1.0)

# 0.6362 0.7891 0.916 0.9473 0.959 0.9639 0.9658 0.9663 0.9688 0.9697 0.9702 0.9727 0.9736 0.9746 0.9736 0.9736 0.9746 0.9751 0.9751 0.9756

learner = SimpleLearner(data_loaders, SimpleNet(28*28, 1))
learner.train_model(20, learning_rate=1.0)
# 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068

On further inspection, I found that, nn.Linear parameters have different size that my custom LinearModel.

[p.size() for p in nn.Linear(28*28,30).parameters()]
# [torch.Size([30, 784]), torch.Size([30])]

[p.size() for p in LinearModel(28*28,30).parameters()]
# [torch.Size([784, 30]), torch.Size([30])]

While I’m not sure if this could be the reason, I tried transposing the weights anyway. But that doesn’t run.

Any idea what I could be missing?

AmorfEvo · December 22, 2022, 7:45pm

Hmm, I checked lesson chapter 4 here:
https://github.com/fastai/fastbook/blob/master/04_mnist_basics.ipynb

But I didn’t find SimpleLearner in that code and Learner has fit() method instead of train_model() method and fit() has “lr” parameter instead of “learning_rate”, so I was a bit confused, maybe you meant another chapter 4 or you have a very customized own code already?

Anyway, I saw there is train_model() in the linked 04_mnist_basics notebook, so I made a little code for you from this notebook and tried out your LinearModel and your SimpleNet.
I didn’t change anything in the codes, only copied some necessary parts and put your 2 models in it.
Both of your models seem to work, I link the code and maybe you can check it
(I cannot upload it as ipynb, but the forum allows it as pdf - hopefully you can still copy the content)

for_sayanarijit.pdf (315.7 KB)

sayanarijit · December 23, 2022, 6:34am

Ah thanks a lot…

SimpleLearner is a custom implementation of BasicOptim.

class SimpleLearner:
    """A simple learner to train models."""
    def __init__(self, data_loaders, model):
        self.data_loaders = data_loaders
        self.model = model

    def calculate_gradient(self, image, target):
        predictions = self.model(image)
        loss = mnist_loss(predictions, target)
        loss.backward()

    def step(self, learning_rate):
        for param in self.model.parameters():
            param.data -= param.grad.data * learning_rate

    def reset_gradient(self):
        for p in self.model.parameters():
            p.grad = None

    def train_epoch(self, learning_rate):
        for batch_of_images, batch_of_targets in self.data_loaders.train:
            self.calculate_gradient(batch_of_images, batch_of_targets)
            self.step(learning_rate)
            self.reset_gradient()

    def validate_epoch(self):
        accuracy = []
        for batch_of_images, batch_of_targets in self.data_loaders.valid:
            batch_of_predictions = self.model(batch_of_images)
            acc = batch_accuracy(batch_of_predictions, batch_of_targets)
            accuracy.append(acc)
        return round(torch.stack(accuracy).mean().item(), 4)

    def train_model(self, epochs, learning_rate):
        for _ in range(epochs):
            self.train_epoch(learning_rate)
            print(self.validate_epoch(), end=' ')

But there was no issue with this implementation either. The actual issue was that in batch_accuracy(), instead of using .sigmoid() I was using a custom sigmoid function from the course.

def sigmoid(x):
    """Not the actual implementation."""
    return 1 / (1 + torch.exp(-x))

def batch_accuracy(predictions_batch, targets_batch):
    preds = sigmoid(predictions_batch)  # !!! Correct: predictions_batch.sigmoid()
    threes = preds > 0.5
    correct = threes == targets_batch
    return correct.float().mean()

So, the actual sigmoid implementation seems to be different from what was explained in the chapter.