I am training a binary human segmentation model on human data and my loss/accuracy suddenly job mid-training for no apparent reason. I initlise my model as :
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=1e-2,loss_func=lossf)
and train in two frozen and unfrozen parts. Loss is just plain cross-entropy. The loss jumps in the first part and remains there throughout. Here is how the logs look:
What could be going wrong?
If I had to guess you overfit your model, it hints at it (train loss higher than Val loss). How big of a dataset are you using?
Yes but even train loss spikes. Shouldnt it go very low then? Dataset is 6000 images
I’ve had this before and never worked out the perfect reason. Lowering the max lr was how I approached it (while keeping min-lr the same).
It seems your training diverges, most probably due to improper learning rate used for this problem. Setting a proper learning rate will most probably help you solve this or at least diagnose the problem further. If I were you, I would use the learning rate finder as a start (learn.lr_find(); learn.recorder.plot()). Then if still not clear, you can post the results of using lr_find() here and we could further help.
Yes lowering the learning rate solved the problem. Thanks everyone