MNIST Sample with Cross Entropy

yasir2010 · September 29, 2022, 9:02pm

Hello

Working on chapter 5, Multi-Class classification with Cross-Entropy

Took Mnist Sample data, and try to classify (3,7) with cross-entropy

Kindly find the code here in google colab mnist_with_cross_entropy

Code

from fastai.vision.all import *

path = untar_data(URLs.MNIST_SAMPLE)

path

Path('D:/DATA/y.iqbal/.fastai/data/mnist_sample')

stacked_threes = torch.stack([tensor(Image.open(o))/255 for o in (path/'train/3').ls()])
stacked_sevens = torch.stack([tensor(Image.open(o))/255 for o in (path/'train/7').ls()])

stacked_threes.shape, stacked_sevens.shape

(torch.Size([6131, 28, 28]), torch.Size([6265, 28, 28]))

valid_stacked_threes = torch.stack([tensor(Image.open(o))/255 for o in (path/'valid/3').ls()])
valid_stacked_sevens = torch.stack([tensor(Image.open(o))/255 for o in (path/'valid/7').ls()])

valid_stacked_threes.shape, valid_stacked_sevens.shape

(torch.Size([1010, 28, 28]), torch.Size([1028, 28, 28]))

train_x = concat(stacked_threes,stacked_sevens).view(-1,28*28)
train_y = concat(tensor([1]*len(stacked_threes)),tensor([0]*len(stacked_sevens))).unsqueeze(1)

train_x.shape,train_x.shape

(torch.Size([12396, 784]), torch.Size([12396, 784]))

valid_x = concat(valid_stacked_threes,valid_stacked_sevens).view(-1,28*28)
valid_y = concat(tensor([1]*len(valid_stacked_threes)),tensor([0]*len(valid_stacked_sevens))).unsqueeze(1)

valid_x.shape,valid_y.shape

(torch.Size([2038, 784]), torch.Size([2038, 1]))

dset = list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))

dl = DataLoader(dset,batch_size=256)
valid_dl = DataLoader(valid_dset,batch_size=256)

def init_params(size,std=1.0):
    return (torch.randn(size)*std).requires_grad_()

weights1 = init_params((28*28,5))
bias1 = init_params(5)
weights2 = init_params((5,2))
bias2 = init_params(2)

def linear1(xb):
    res = xb@weights1+bias1
    res = res.max(tensor(0.0))
    res = res@weights2+bias2
    return res

def mnist_loss(preds,targets):
    targets = targets.T.squeeze()
    return F.cross_entropy(preds,targets)

def calc_grad(xb,yb):
    preds = linear1(xb)
    loss = mnist_loss(preds,yb)
    loss.backward()

params = weights1,bias1,weights2,bias2
def train_epoch():
    for xb,yb in dl:
        calc_grad(xb,yb)
        for p in params:
            p.data -= p.grad*0.1
            p.grad.zero_()

def valid_batch_accu(xb,yb):
    preds = xb
    corrects = torch.argmax(preds,dim=1).unsqueeze(1) == yb
    return corrects.float().mean()

def validate_epoch():
    accu = [valid_batch_accu(linear1(xb),yb) for xb,yb in valid_dl]
    return torch.stack(accu).mean()

for i in range(10):
    train_epoch()
    print(validate_epoch())

tensor(0.6342)
tensor(0.6782)
tensor(0.7363)
tensor(0.7773)
tensor(0.8100)
tensor(0.8335)
tensor(0.8486)
tensor(0.8623)
tensor(0.8755)
tensor(0.8857)

a_7_image = tensor(Image.open((path/'valid/7/9711.png')))/255
ten_sor = (a_7_image).view(-1,28*28)
ten_sor.shape

torch.Size([1, 784])

linear1(ten_sor)

tensor([[ 3.6760, -2.5360]], grad_fn=<AddBackward0>)

code is giving accuracy nearly 80%.

But when test a 7 image. it’s giving wrong results. kindly check at the end of code , testing a 7 image against trained neural net

Mean’s this approach (code ) is wrong.

Can anybody identify what mistake is being done here and point in right direction

Thanks!

benkarr · September 29, 2022, 9:53pm

Hey, the code is correct
Have a look at:

train_y = concat(tensor([1]*len(stacked_threes)),tensor([0]*len(stacked_sevens))).unsqueeze(1)

and

valid_y = concat(tensor([1]*len(valid_stacked_threes)),tensor([0]*len(valid_stacked_sevens))).unsqueeze(1)

You are assigning the label 1 to all 3s and all 7s get the label 0. The result of you prediction was:

tensor([[ 3.6760, -2.5360]], grad_fn=<AddBackward0>)

The first value is bigger than the second and we would interpret that as: the model predicts the first label. Python arrays start at 0, so the label you are predicting is 0, which is the label for 7s.

Hope that makes sense.

yasir2010 · September 30, 2022, 11:07am

yes it totally make sense

and unable to see so simple interpretation of prediction.

Thank You