In Lesson 2, @jeremy talks about creating a model where he first runs low-resolution images through the network, then re-runs it with his 224px images. To me, this seems like it should behave the same as using a higher zoom factor in the data augmentation. Is there any reason to think otherwise?
Low-res is faster (less pixels). And zooming removes some of the image, whereas downsampling keeps it all. Also, low-res results in different receptive field in later layers.
One real-world example of this is when doing document classification, I have found that my low-res pictures (64x64) work better than the high-res (800x800). My theory here is that we don’t have that many of each document and trying to use the higher res image gives the model too much to look at. Instead just giving it a high level overview of the documents structure does a better job.
Thank you both for your replies. To get a better sense of how this would work, I experimented with the low-res-first approach on Cats & Dogs and my results have not been favorable. I’m putting this out there to see if people can figure out why this is working so poorly for me, when it sounds like it should simply be better.
First, I started with either 64x64 or 128x128. I would pretrain the final layers, then I would either unfreeze the model (or not), and then I would do a final training on the 224x224 images. So, 4 experiments (64 pretrain, 64 pretrain+unfreeze, 128 pretrain, 128 pretrain+unfreeze), compared to the default that Jeremy demonstrates in the videos.
All of those 4 approaches were inferior to directly training on the 224 images. (I get ~50-100 wrong out of 2000 when I pretrain on low-res, as opposed to 6 wrong when I use Jeremy’s approach from the lectures.)
This leads me to a guess: I wonder if I misunderstood when @jeremy thinks it’s useful to start with low-res images. If Jeremy’s point was that low-res pretraining is useful for a non-pretrained architecture that otherwise has random weights, then it would make sense that my approach could only harm the well-calibrated weights in resnet50.
Rewatching the lecture, Jeremy actually explicitly comments on this: “This thing where I went all the way back to a size of 64, I wouldn’t do that if I was doing dogs and cats or dog breeds, since this is so small. If the thing I was worried about was very similar to ImageNet, I would kind of destroy those ImageNet weights.”
So I think the “badness” in this particular circumstance comes from the fact that the dogs and cats are from ImageNet.
Right - very similar to ImageNet (not actually from ImageNet FYI). Also, there’s a problem with batchnorm statistics changing when changing size which I didn’t appreciate back when I did that video. We’ll be studying it in the next version of part 2, and fastai_v1 will contain a fix for it by default.
Tremendous. Thank you for this great resource.