Complete all the steps in chapter 4 using the full MNIST datasets Help

I wanted to try to get the full MNIST data trained on a deep learning model. I searched google to see what I could find and found this I was able to implement my own version of that with no dropout layers since I’m not sure what they do yet here. However, When I try to do it the fast ai way that I learned in chapter 4. I get a TypeError: forward() takes 2 positional arguments but 10 were given. Do I need to overwrite forward to hande the 10 numbers I’m trying to classify? Here’s the notebook I’ve been doing that in

1 Like

You’re passing in the constructor to the model parameter of Learner, not the model itself. You need to generate an instance of it with simple_net()

1 Like

you are totally right, rookie mistake thanks for catching it.

When I do pass in simple_net() to the Learner, I get a missing input error. In the chapter 4 notebook we also just pass the constructor into the Learner class and it handled it.

Ah, yes. You are 100% right! Nice catch.

Okay, what does this return to you? I’m curious on the size of your batch:

x,y = dls.one_batch()

print(x.shape, y.shape)

It can’t unpack 10 values into 2. I think I need to smush my 10 lists of tensors that comprise my x and the 10 lists of tensors that comprise my y.

@dmoneyballer

a dataset is basically a container containing ordered pairs of x’s and y’s. If you closely look at the definition of your dsets, your dsets is a list of 10 other lists (corresponding to zeros, ones and so on till nines). Further, all these 10 lists are only your x tensors. You havent told the code how to obtain its corresponding y values.

So, when the dataloader is trying to index into your dset (say using dset[0]), it gets a tuple containing one tensor for zero, one tensor for one, and 8 other such tensors, whereas it is expecting something like (x,y)=(zero_tensor,0).

Here’s how you should approach it… I’m writing some code below that you may use to change your own code. Though I must tell you that this is not an efficient way to code.

x=[] #empty list
x.extend((*zeros,*ones....*nines))
  
y=[0]*len(zeros) +[1]*len(ones) + .... [9]*len(nines)
dset=list(zip(x,y))

Now you see, when the dataloader would index into the dset, it would get exactly two values…
Cheers

2 Likes

I see what you mean about it being a really inefficient way of coding so I rewrote it this way

train_x = []
train_y = []
for i, pat in enumerate(train_path):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    train_x += num
    train_y += tensor([i*1.0]*len(num))
train_x = train_x[0].view(-1, 28*28)
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'training').ls().sorted()):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    test_x += num
    test_y += tensor([i*1.0]*len(num))
test_x = test_x[0].view(-1, 28*28)   
testing_dset = list(zip(test_x, test_y))```

So I’ve been playing with it for a while but the learner.fit call always fails, it’s been failing with an expected scalar type Byte but found Float. so I figure this is a data loading issue that I’m running into. I have a notebook here showing the error I get. I print some samples of data after loading it to figure out what’s going on with it and it looks fine to me, my x and y from my training loder look like this: (torch.Size([64, 1, 784]), torch.Size([64, 1])
so I assumed all the rows lined up. is it possible to do a pdb stack trace and print stuff out in a jupyter notebook?

train_path = (path/'training').ls().sorted()
train_x = []
train_y = []
for i, pat in enumerate(train_path):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    train_x += num
    train_y += tensor([i]*len(num)).unsqueeze(1)
# print(train_x[0].shape, train_x[0].view(-1, 28*28))
train_x = [x.view(-1, 28*28) for x in train_x]
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'testing').ls().sorted()):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    test_x += num
    test_y += tensor([i]*len(num)).unsqueeze(1)
test_x = [x.view(-1, 28*28) for x in test_x]   
testing_dset = list(zip(test_x, test_y))

is how I’m loading the data. I know Jeremy says in the course to not loop over things but I didn’t know how to change all the views on all the tensors except with the list comprehension.

I called a .float() on each tensor and that seemed to fix the issue.

@dmoneyballer I have been through your notebook, great work.

I wanted to kindly ask, in (your implementation for the full mnist dataset. I saw you maintained your loss function to be original mnist_loss where you take torch.where etc, the same as the one for binary, shouldnt you use cross entropy loss for the full mnsit dataset ?

On the other hand, I went through the rest of the files in this repo and I saw you other files where you defined classes that used softmax and nll_loss meaning using these functions will use cross entropy loss as a loss function.

So my question was, if you were implementing the full mnist dataset model which is a multicategory classification problem, why did you use the binary classification mnist_loss in your MNIST.ipynb notebook and how can we define a loss function that uses cross entropy loss as a loss function ?