Custom vision model error with fastai v2

Hi everyone,

I’m having trouble migrating code that worked on fastai v1 to fastai v2. Presume the following simple model:

class model2L(nn.Module):
    def __init__(self, num_classes=2):
        super(model2L, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(56 * 56 * 64, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

To apply it to some data, I do the following:

image_data = ImageDataLoaders.from_folder(ts_path, train='train', bs=batch_size, valid='valid', size=224)
eqs_model = model2L().cuda()
learn = Learner(image_data, eqs_model, loss_func=loss_func, metrics=accuracy)

The only difference between the procedure so far between the two versions is using ImageDataLoaders, which seems to have replaced the v1 function ImageDataBunch.

However, whenever I try to do anything with learn, I get the following error which I never got in v1:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2880000 and 200704x2)

Does anyone have any idea of any changes between v1 and v2 that could be causing this? Maybe it has to do with the way that ImageDataLoaders structures the data?

Thanks!

I would recommend looking at the data and see if it looks like you are expecting. You could try something like this:

x,y = image_data.one_batch() #check y and x at this point, does it have the shape you are expecting?
out = learn.model(x) #shape is as expected, process through the model
#assuming this passes, does the output look correct?

You could also try running through one line of your model at a time. instead of running everything does learn.model.features(x) give you the expected shape?

1 Like

Hi @KevinB, thanks for the suggestions.

Both x and y have a size of 32, which is equivalent to my batch size. Not sure what those values are supposed to represent.

What’s interesting is that everything crashes when trying the out = learn.model(x) due to memory issues.

RuntimeError: CUDA out of memory. Tried to allocate 1.38 GiB (GPU 0; 11.76 GiB total capacity; 9.19 GiB already allocated; 242.94 MiB free; 9.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Given that the GPU had no problems with this batch size in v1, this suggests that somehow it’s interpreting the data in a different way.

To triple check, I went through and passed each step of the model through a pretend single image input:

m = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Flatten()
        )
input = torch.randn([3, 224, 224])
output = m(input)

The size of the output is [64, 3136], which matches the size of the linear layer nn.Linear(56*56*64,2) if we assume the forward step in the original (I’m not sure how to use nn.Flatten() to output to 1D).

To check whether or not this was a batch issue, I changed the batch size to 1, and tried your suggestion again, and this time I got the original error when running out = learn.model(x) instead of out of memory:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2880000 and 200704x2)

I have no idea why the size balloons to 2880000 when running the data object but seems fine when running the model in the example case. Any more feedback or suggestions are greatly appreciated!

I don’t get a runtime error when I run your bottom piece of code but I did have to add another dimension to your torch.randn that represents the batch.
This runs successfully on my end:

m = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Flatten()
        )
input = torch.randn([1, 3, 224, 224])
output = m(input)

output.shape

I also ran your top model through and it ran successfully when I added the batch dimension.

so the input shape should be [batchsize, channels, height, width] at the input.

can you try this again:

x,y = image_data.one_batch()
x.shape

I think your dataloader is where I would guess the issue actually is, but not seeing anything obvious

1 Like

Thanks @KevinB for your thorough response.

Sorry, I was not very clear in my previous response. The only time I get the runtime error is when I run it with the actual data. Considering that there doesn’t appear to be anything fishy happening with the model, I agree with you that the problem is most likely in the dataloader.

I’m trying to figure out ways I can test this besides just digging through the code differences between ImageDataLoaders and ImageDataBunch (v2 and v1 respectively). If you have a stroke of inspiration let me know :slight_smile:

Thanks anyways!

Ok got it, figured out what the problem is. The data does not get resized into 224x224 by the ImageDataLoaders.from_folder parameter size. In fact, it doesn’t seem to have that parameter at all, unlike v1’s ImageDataBunch. Strange how it doesn’t throw an error.

The 2880000 number totally makes sense now. The original images are 600x1200, so the CNN shapes go as follows:
[1,32,300,600] → [1,64,150,300]
where 64x150x300 = 2880000

To make matters weirder, if I leave it as is and instead of using my own model on vision_learner I use something like ResNet34, it works without a problem.

So in the end, the question becomes really simple. How do I resize images in fastai v2 and at what point should I do that?

1 Like

Here is a resize method

ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))

You can also pass aug_transforms(size=224) to batch_tfms

Excellent thanks!