Model gets worse with learning rate 0

kardon · November 22, 2018, 7:38am

I am training my pretrained model on worse data than it was trained on, so I do expect it to get worse. But I actually use a learning rate of 0. What could be the reason it gets worse?
I could only think of weight decay and momentum, but I believe I disabled it by using the fit function with SGD without momentum.

model = torch.load('model.pth')
sgd = optim.SGD(model.parameters(), lr = 0.0, momentum=0.0)
data_bunch = create_custom_data_bunch('data.csv')
fit(epochs=1, model=model, loss_func=loss_func, opt=sgd, data=data_bunch)

kardon · November 22, 2018, 10:37am

I’m not home right now so I can’t test it but I believe I found the reason, I guess the batch normalization layers still get trained.