I wanted to try to get the full MNIST data trained on a deep learning model. I searched google to see what I could find and found this I was able to implement my own version of that with no dropout layers since I’m not sure what they do yet here. However, When I try to do it the fast ai way that I learned in chapter 4. I get a TypeError: forward() takes 2 positional arguments but 10 were given. Do I need to overwrite forward to hande the 10 numbers I’m trying to classify? Here’s the notebook I’ve been doing that in
You’re passing in the constructor to the model parameter of Learner, not the model itself. You need to generate an instance of it with simple_net()
you are totally right, rookie mistake thanks for catching it.
When I do pass in simple_net() to the Learner, I get a missing input error. In the chapter 4 notebook we also just pass the constructor into the Learner class and it handled it.
Ah, yes. You are 100% right! Nice catch.
Okay, what does this return to you? I’m curious on the size of your batch:
x,y = dls.one_batch()
print(x.shape, y.shape)
It can’t unpack 10 values into 2. I think I need to smush my 10 lists of tensors that comprise my x and the 10 lists of tensors that comprise my y.
a dataset is basically a container containing ordered pairs of x’s and y’s. If you closely look at the definition of your dsets, your dsets is a list of 10 other lists (corresponding to zeros, ones and so on till nines). Further, all these 10 lists are only your x tensors. You havent told the code how to obtain its corresponding y values.
So, when the dataloader is trying to index into your dset (say using dset[0]), it gets a tuple containing one tensor for zero, one tensor for one, and 8 other such tensors, whereas it is expecting something like (x,y)=(zero_tensor,0).
Here’s how you should approach it… I’m writing some code below that you may use to change your own code. Though I must tell you that this is not an efficient way to code.
x=[] #empty list
x.extend((*zeros,*ones....*nines))
y=[0]*len(zeros) +[1]*len(ones) + .... [9]*len(nines)
dset=list(zip(x,y))
Now you see, when the dataloader would index into the dset, it would get exactly two values…
Cheers
I see what you mean about it being a really inefficient way of coding so I rewrote it this way
train_x = []
train_y = []
for i, pat in enumerate(train_path):
num = [tensor(Image.open(o)) for o in (pat).ls()]
train_x += num
train_y += tensor([i*1.0]*len(num))
train_x = train_x[0].view(-1, 28*28)
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'training').ls().sorted()):
num = [tensor(Image.open(o)) for o in (pat).ls()]
test_x += num
test_y += tensor([i*1.0]*len(num))
test_x = test_x[0].view(-1, 28*28)
testing_dset = list(zip(test_x, test_y))```
So I’ve been playing with it for a while but the learner.fit call always fails, it’s been failing with an expected scalar type Byte but found Float. so I figure this is a data loading issue that I’m running into. I have a notebook here showing the error I get. I print some samples of data after loading it to figure out what’s going on with it and it looks fine to me, my x and y from my training loder look like this: (torch.Size([64, 1, 784]), torch.Size([64, 1])
so I assumed all the rows lined up. is it possible to do a pdb stack trace and print stuff out in a jupyter notebook?
train_path = (path/'training').ls().sorted()
train_x = []
train_y = []
for i, pat in enumerate(train_path):
num = [tensor(Image.open(o)) for o in (pat).ls()]
train_x += num
train_y += tensor([i]*len(num)).unsqueeze(1)
# print(train_x[0].shape, train_x[0].view(-1, 28*28))
train_x = [x.view(-1, 28*28) for x in train_x]
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'testing').ls().sorted()):
num = [tensor(Image.open(o)) for o in (pat).ls()]
test_x += num
test_y += tensor([i]*len(num)).unsqueeze(1)
test_x = [x.view(-1, 28*28) for x in test_x]
testing_dset = list(zip(test_x, test_y))
is how I’m loading the data. I know Jeremy says in the course to not loop over things but I didn’t know how to change all the views on all the tensors except with the list comprehension.
I called a .float() on each tensor and that seemed to fix the issue.
@dmoneyballer I have been through your notebook, great work.
I wanted to kindly ask, in (your implementation for the full mnist dataset. I saw you maintained your loss function to be original mnist_loss where you take torch.where etc, the same as the one for binary, shouldnt you use cross entropy loss for the full mnsit dataset ?
On the other hand, I went through the rest of the files in this repo and I saw you other files where you defined classes that used softmax and nll_loss meaning using these functions will use cross entropy loss as a loss function.
So my question was, if you were implementing the full mnist dataset model which is a multicategory classification problem, why did you use the binary classification mnist_loss in your MNIST.ipynb notebook and how can we define a loss function that uses cross entropy loss as a loss function ?