Why do we view/reshape(-1, n*m)?

Hi, I see a in a lot of image processing code something like this (from the course):

A stack of k 2D n x m images --> view/reshape(-1, n x m) --> A stack of k 1D (n x m) vectors

For example:

100 2D 28 x 28 images --> view/reshape(-1, 28 x 28) --> A stack of 100 1D 784 vectors

Why do we convert the 2D image to a 1D line? Why reshape?

Hi Consolidated. Welcome, and thanks for asking a great beginner’s mind question.

You typically find this line right before a fully connected layer (Linear) to flatten a set of features into a line. Linear treats all of its inputs the same, so does not care about any feature structure that was pertinent to preceding layers. In fact, you have to flatten the feature dimensions into a line in order for Linear to train with them.

Could you please post the before and after code so that we can see whether this answer fits? :slightly_smiling_face:

1 Like

Malcolm is right! I’d just add, so that you don’t get confused. CNNs usually have linear layers towards the end! That is, 2D features are converted to 1D features. Usually 2 or 3 such linear layers are there, the final of which corresponds to your output(eg, a layer with 10 neurons, corresponding to 10 classes of your data classification)

Thanks @Pomo, @PalaashAgrawal

Yes, it seems to happen before dense layers. I think more generally, I see a pattern of [some network architecture] followed by 2-3 dense layers and softmax.

What is the intuition for flattening from 2D to 1D? Is this convention? Wouldn’t you lose (spatial) information moving from 2D to 1D? It seems like we don’t care, but I’m not sure why?

I get the sense that [some network architecture] is basically feature engineering the images. If you think of your engineered features as an input layer, then the final 2-3 dense layers and softmax are like the hidden and output layers of a more traditional DNN?

Code example:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

I think the patterns for model design will become clearer as you progress through the course and see the architectures for various application areas.

Not convention, necessity. You must at some point reduce 2D to 1D in order to map an image into its class. :wink:

That’s seems a valid way to conceptualize it. I’d add that you are not engineering the features, rather the model discovers them. You only get to engineer the architecture.