Exploring Image Size & Accuracy of Transfer Learning in Lesson 1 Pets

How accurate is transfer learning with resnet34 at different image resolutions? Is this method as effective if the images are scaled down to 192px? 128px? 64px? 32px?


Visually examining the dataset, I had serious doubts it would be successful as it shrunk to 32px.

Here is the helper to sample the different sizes:

def show(size, rows=2):
   data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=size, bs=bs





I would expect a human could could perform the task at 128px, but start to struggle as the images got any smaller. Too much detail was being lost.

Gathering Results

I wrote a helper function: given an image size, calculate the error rates (see the first phase of transfer learning that Jeremy describes in the Lecture 1 Pets training)

def error_rate_of(size:int, cycles:int=4):
  data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=size, bs=bs
  learn = cnn_learner(data, models.resnet34, metrics=error_rate)
  return [float(m[0]) for m in learn.recorder.metrics]

I quickly sampled a few different sizes:

print(error_rate_of(32, 6))
[0.9059540033340454, 0.8660351634025574, 0.8558863401412964, 0.8308525085449219, 0.8281461596488953, 0.8213802576065063]

print(error_rate_of(224, 6))
[0.11705006659030914, 0.08660351485013962, 0.08186738938093185, 0.07780785113573074, 0.07239513099193573, 0.07307171821594238]

As expected at 32 pixels the model didn’t perform well (achieving 82% error rate after 6 epochs). And a quick test of 224 shows the error rates are correctly being calculated at ~7%.

Next up, let’s running through a range of sizes, saving the error rates for 6 epochs of training:

scores = {}

for size in range(32, 225, 16):
  scores[size] = error_rate_of(size, 6)


After exporting the scores into a CSV, I was able to build a chart:


My question was - does it make sense to re-use the weights on different sized images?

That chart seems to say: yes

As the images got really small, it understandable cannot do a good job - as with detecting 37 different breeds of cats/dogs based on 32x32 pixel images is an impossible task even for humans.

Transfer learning performed remarkably consistent for this dataset when images were scaled between 160-224 pixels (and continued to achieve 10% error rates up to ~112 pixels).

In other words: those pretrained resnet34 weights are fairly resolution independent.

Next steps: with a different dataset that a human can discern at small sizes, test how the accuracy with the transfer learning ResNET scales as images scale down to 32px. (an easy choice might be classifying 37 different classes from imagenet - any ideas for other datasets to try?)


I would also try this with resnet18 and resnet50, to see how resolution independent each is compared to resnet34. If you are too busy, can you also share your notebook so I can do that?

Check out the Imagenette and Imagewoof. The fist one is a dataset that should be easy to classify, the second one a dataset that is difficult.

1 Like

Good idea to compare ResNet 18/34/50. Here are the individual charts generated as described above, only changing the model.

Very similar curves, but ResNet 50 does appear to do much better, and ResNet 18 very similar.

By grouping by Epoch, I find it easier to compare between the architectures.

The deeper architectures do appear to do better as the image dimensions get smaller.

I haven’t really internalized how to look for overfitting (yet), so perhaps I should be examining that as well.

@adi93, any thoughts about the charts?

a small difference: resnet18 appears to do better (by a percent or so) on image dimension 208 vs 224.

Me neither, I am just on lesson 2. The fact that resnet50 performed better in intuitively appealing, though the error difference is too small.

One other aspect I am noticing is that at lower epochs, curves are first concave, then convex, and this is more pronounced in resnet18 than others. While with higher epochs, they become fully convex. I don’t know what this means, though. I need some more theory.

I’m trying out Imagenette, based on code in a post by @jinudaniel at Imagenette

path = untar_data(URLs.IMAGENETTE)
tfms = get_transforms(max_rotate=25)

def error_rate_of(size:int, cycles:int=4, model=models.resnet18):
    data = (ImageList.from_folder(path).split_by_folder(valid='val')
        .label_from_folder().transform(get_transforms(), size=size)

    learn = cnn_learner(data, model, metrics=error_rate)
    return [float(m[0]) for m in learn.recorder.metrics]

I’m seeing a really low error rate within 1 epoch: 0.010000

error_rate_of(size, 1, models.resnet18)

So either I’m incorrectly creating my databunch, or since resnet was trained on this data, the error rate is low since I’m asking it to transfer learn onto a subset of the classes it has already been trained on. (I’m pretty sure it is this)

While I think ideally I would want to test on new image classes, I think looking at how it scales is still valuable.

I’ll share the results once they are finished running.

Another thing to think about is performing the same analysis when we’re doing transfer learning with images that are higher resolution, e.g, 256, 299, etc. I imagine we’d find a similar trend in that all architectures will do better with the higher resolution, even though the ResNets were trained with images of size 224x224.

1 Like

Here are the results for Imagenette

grouped by epoch

comparing imagenette and pets


imagenette could be performing better because the original training was done on imagenet (a superset of imagenette). But it benefits (probably massively) that the 10 classes are much more visually distinct than 37 different breeds of cats and dogs - especially when the images are resampled to very small image dimensions.

As a next step, maybe I should test with CIFAR-10 (which are 32x32px already).

Looking at CIFAR-10, running for 50 epochs (without doing any unfreezing, fine-tuning and learning rates customization)


The accuracy quickly plateaus at ~80%. Given that start of the art for this dataset is an error rate in the low single digits, it is time to broaden the analysis to include a fine-tuning phase.