I’m thrilled you gave it a go! And no, you haven’t completely screwed up the concept, although I’m not sure you’ve quite got it right either. But frankly, I’m not even sure when this technique helps, and how much it helps. It helped me a tiny bit on the new seedlings competition. But I think perhaps the main thing it helps with is to allow using small images to quickly prototype, and then switch to big images later to fine-tune.
Perhaps this would particularly help for datasets with really big images where you can only do really small batch sizes. E.g. I wonder if you could get a better result on Imagenet this way?
I think more experiments need to be done before any of us can write a somewhat definitive article on this topic. See for instance @slavivanov’s recent post on differential learning rates (which he found didn’t help much with his particular experiments - although I think will look a lot better with SGDR).