At the first 10 mins of the lecture, Jeremy finetuned a resnet model, and changed its input size to 400 by 400.
I’m wondering why did that made a good improvement in the classification accuracy, and how can some one with intuition decides whether or not this would make an improvement or not.
Also i navigated through resnet and found that changing input size, doesn’t change any parameters except for the last dense layer which will be replaced when finetuning.
So I’m wondering if someone needs to do the same approach when finetuning VGG or anyother models that contains many dense layers, which we’ll to freeze its weights.
Thank you :))
Intuition : if the classification task specifically needs fine details when done by an expert human, a higher resolution can potentially help for the result. Let say you want to classify a very small object (a small flower or a small cancer) in a large image, resolution could help. If the object is large but some fine details in the object can change the classification result, then higher resolution should help also.
For almost all computer vision classification tasks, there is usually an optimal balance to find. Higher resolution images use more memory on the GPU (so smaller batches are usually needed). Image noise can get too high if the resolution is too high; the signal to noise ratio depends on many factors but mostly from the image acquisition and from the image compression. A low signal to noise ratio will impact the capacity of the model to generalize. In this case, the model will tend to overfit the noise on the training dataset instead of representing the meaningful statistical information (useful signal).
I hope it helps.