Complete all the steps in chapter 4 using the full MNIST datasets Help

I wanted to try to get the full MNIST data trained on a deep learning model. I searched google to see what I could find and found this I was able to implement my own version of that with no dropout layers since I’m not sure what they do yet here. However, When I try to do it the fast ai way that I learned in chapter 4. I get a TypeError: forward() takes 2 positional arguments but 10 were given. Do I need to overwrite forward to hande the 10 numbers I’m trying to classify? Here’s the notebook I’ve been doing that in

You’re passing in the constructor to the model parameter of Learner, not the model itself. You need to generate an instance of it with simple_net()

1 Like

you are totally right, rookie mistake thanks for catching it.

When I do pass in simple_net() to the Learner, I get a missing input error. In the chapter 4 notebook we also just pass the constructor into the Learner class and it handled it.

Ah, yes. You are 100% right! Nice catch.

Okay, what does this return to you? I’m curious on the size of your batch:

x,y = dls.one_batch()

print(x.shape, y.shape)

It can’t unpack 10 values into 2. I think I need to smush my 10 lists of tensors that comprise my x and the 10 lists of tensors that comprise my y.

@dmoneyballer

a dataset is basically a container containing ordered pairs of x’s and y’s. If you closely look at the definition of your dsets, your dsets is a list of 10 other lists (corresponding to zeros, ones and so on till nines). Further, all these 10 lists are only your x tensors. You havent told the code how to obtain its corresponding y values.

So, when the dataloader is trying to index into your dset (say using dset[0]), it gets a tuple containing one tensor for zero, one tensor for one, and 8 other such tensors, whereas it is expecting something like (x,y)=(zero_tensor,0).

Here’s how you should approach it… I’m writing some code below that you may use to change your own code. Though I must tell you that this is not an efficient way to code.

x=[] #empty list
x.extend((*zeros,*ones....*nines))
  
y=[0]*len(zeros) +[1]*len(ones) + .... [9]*len(nines)
dset=list(zip(x,y))

Now you see, when the dataloader would index into the dset, it would get exactly two values…
Cheers

2 Likes

I see what you mean about it being a really inefficient way of coding so I rewrote it this way

train_x = []
train_y = []
for i, pat in enumerate(train_path):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    train_x += num
    train_y += tensor([i*1.0]*len(num))
train_x = train_x[0].view(-1, 28*28)
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'training').ls().sorted()):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    test_x += num
    test_y += tensor([i*1.0]*len(num))
test_x = test_x[0].view(-1, 28*28)   
testing_dset = list(zip(test_x, test_y))```

So I’ve been playing with it for a while but the learner.fit call always fails, it’s been failing with an expected scalar type Byte but found Float. so I figure this is a data loading issue that I’m running into. I have a notebook here showing the error I get. I print some samples of data after loading it to figure out what’s going on with it and it looks fine to me, my x and y from my training loder look like this: (torch.Size([64, 1, 784]), torch.Size([64, 1])
so I assumed all the rows lined up. is it possible to do a pdb stack trace and print stuff out in a jupyter notebook?

train_path = (path/'training').ls().sorted()
train_x = []
train_y = []
for i, pat in enumerate(train_path):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    train_x += num
    train_y += tensor([i]*len(num)).unsqueeze(1)
# print(train_x[0].shape, train_x[0].view(-1, 28*28))
train_x = [x.view(-1, 28*28) for x in train_x]
training_dset = list(zip(train_x, train_y))
test_x = []
test_y = []
for i, pat in enumerate((path/'testing').ls().sorted()):
    num = [tensor(Image.open(o)) for o in (pat).ls()]
    test_x += num
    test_y += tensor([i]*len(num)).unsqueeze(1)
test_x = [x.view(-1, 28*28) for x in test_x]   
testing_dset = list(zip(test_x, test_y))

is how I’m loading the data. I know Jeremy says in the course to not loop over things but I didn’t know how to change all the views on all the tensors except with the list comprehension.

I called a .float() on each tensor and that seemed to fix the issue.