Hello! I wrote a 8 layers deep NN containing batch norm layers, fully connected. It uses Adam and the 1 cycle learning rate. When I run the lr_find, the shape given is similar to what I got for other architectures (and to what I can see in the videos) but based on the plot the best LR is around 0.1. However, when I try that LR=0.1 after about 10 epochs I get “inf” for the loss of the validation set. Using the exactly same architecture, but without batch norm, i get a proper value of about 0.005 based on the lr_find and using that LR=0.005 in my actual training loop gives good values for both the training and validation set for even 500 epochs. Why is that 0.1 value given by the lr_find giving me these weird results? I would say it’s maybe overfitting, but I thought that batch norm should actually reduce overfitting a bit, everything else being kept the same. I am really new to ML so any advice is greatly appreciated. Thank you!
SilvMala (Silviu-Marian Udrescu) #1