I have been trying to overfit the DenseNet121 model (from the pytorch model zoo) on 10 images, and have been unable to do so. The train loss drops to 0.20 after 50 epochs.
I have also tried a one convolution -> flatten -> dense output model, and that achieves a 0.002 train loss after 50 epochs.
Everything except the architecture is identical.
I really don’t understand while DenseNet121 is not able to overfit as well as the basic network, could someone please shed some light on this matter for me?