Part 1, Chapt 4 newbie question

Kantan_desu · December 17, 2020, 3:16am

I am trying to do Chapter 4 with the full MNIST set.

I did this to make my train_y and valid_y tensors:

train_y = tensor([0]*len(zeroes) +

            [1]*len(ones) + 

            [2]*len(twos) + 

            [3]*len(threes) +

             [4]*len(fours) +

             [5]*len(fives) +

             [6]*len(sixes) +

             [7]*len(sevens) +

             [8]*len(eights) +

             [9]*len(nines)).unsqueeze(1)

train_x.shape,train_y.shape

(torch.Size([60414, 784]), torch.Size([60000, 1]))

valid_x = torch.cat([valid_0_tens,

                 valid_1_tens,

                 valid_2_tens,

                 valid_4_tens,

                 valid_5_tens,

                 valid_6_tens,

                 valid_7_tens,

                 valid_8_tens,

                 valid_9_tens,

                 ]).view(-1, 28*28)

valid_y = tensor([0]*len(valid_0_tens) +

             [1]*len(valid_1_tens) +

             [2]*len(valid_2_tens) + 

             [3]*len(valid_3_tens) + 

             [4]*len(valid_4_tens) + 

             [5]*len(valid_5_tens) + 

             [6]*len(valid_6_tens) + 

             [7]*len(valid_7_tens) + 

             [8]*len(valid_8_tens) + 

             [9]*len(valid_9_tens)).unsqueeze(1)

valid_dset = list(zip(valid_x,valid_y))

(torch.Size([8990, 784]), torch.Size([10000, 1]))

As you can see, the operation seemed to have “removed entries” (for lack of a better word) from train_x in making train_y, and likewise “added entries” to valid_x when making valid_y

Why is it either returning more or less entries in train_y then in train_x? As I understand it, it’s the exact same operation as in the tutorial proper with the same tensor shape so it shouldn’t change anything. I suspect there’s some kind of default limitations since the cutoff is so clean, but the pytorch documentation is not too illuminating. Any suggestions? I feel rather dumb for asking but I am stuck.

johannesstutz · December 17, 2020, 9:48am

Hi and welcome! All I can say is the sizes of your target tensors are correct. MNIST has 60.000 training items and 10.000 validation items, that’s where the clean numbers come from, it’s not a PyTorch limitation.

Can you maybe share your notebook? It’s hard to tell where the error comes from.

Kantan_desu · December 17, 2020, 4:14pm

Hi Johannes,
Yes absolutly:

github.com

animefan380/fastai_practice/blob/main/Copy_of_04_mnist_basics.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "jupytext": {
      "split_at_heading": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",

This file has been truncated. show original

Thank you for the assist

johannesstutz · December 18, 2020, 10:33am

Found the errors

When converting the images to tensors, you’re using the “seven” images for the “eight” tensors.

In the validation set, you missed the “threes”.

That should fix it. Maybe you can figure out a way to load the images in a loop instead of writing all the code by hand, which is just asking for trouble

Kantan_desu · December 18, 2020, 9:00pm

It works ! and yeah ,loop does make a lot more sense

Thanks again ^_^.