Changing data of learner - Resnet34

4722794 · January 28, 2020, 2:13pm

Hi,

In several lessons (starting lesson 3) Jeremy re-trains the learner using larger size of the images to better train the parameters. (starting with 128, 256 etc)

Just wanted to understand, wouldn’t changing learn.data = ‘new data’ with larger image sizes change our model architecture?

The operation would still make sense, since the parameters (kernels) can work through images of variable sizes, but just wondering about the overall architecture of the model.

LessW2020 · January 29, 2020, 3:34am

Hi @4722794,
The data is just the input to the model.
Changing the incoming data won’t change the model architecture at all. The model architecture is a fixed framework filled in with weights and biases.

The weights will adjust to deal with the info from the larger images, but the model is the same.

Hope that helps!

ark_aung · January 29, 2020, 7:25am

Hey @4722794,

When you change the input shape, the model architecture does not change. And when I say model architecture, it is the number of filters per conv layer, the number of conv blocks (or residual block for ResNet), etc. Of course, your model will readjust the weights when you train using larger input image sizes. And weight readjustments does not mean different model architecture.

However, the output shapes of your conv layers and your pooling layers will be different and if you have hard-coded your architecture, you may have a shape mismatch problem at one point. This usually happens at a point when you flatten your conv layer outputs. GlobalAveragePooling, GlobalMaxPooling and the likes handle this problem since they only take one value (avg or max respectively) per conv filter and since you have already defined your model architecture beforehand, the number of conv filters will still be the same despite having different output shapes. Therefore, the models still work perfectly fine despite having different input shapes.

Let you know if you find my explanation confusing.

4722794 · January 30, 2020, 9:07am

So reading your explanation here’s what I’ve understood (correct me if I’m again mistaken) The model architecture is like a juicer. The images are like oranges. The oranges could be of different sizes, doesn’t mean we need to (mostly) change our juicer; it can handle them all and give back orange juice. Right?