I’ve been trying to modify a pretrained resnet and ran into NaNs during training. In an attempt to debug, I reduced the modifications to the bare minimum. I think these two models should be identical, yet they give very different losses on the same training data. Can anyone explain my mistake?
Yes I did the latter. Set every seed known to humankind and num_workers=0. Each model is run separately after Restart Kernel and gives consistent losses.
It seems like the original resnet50 pretrained weights would be in RNmodel (Model 2). Maybe something I am doing confuses automatic differentiation? Or perhaps how the loss function is applied at the end?
Lots of theories. If no one can just see the problem, I will need to learn how to trace inside a model evaluation.
Well, I never figured out the issue. Instead found a way to tack two layers onto the start and end of resnet that give results consistent with the original resnet, and accomplish the task.
I was able to trace that RNmodel(xb) (Model 2) yields a different output (same inputs) than resnet in its original form (Model 1). But why is a mystery.