I am trying to implement a train function but it isn’t training my NN. It is running but doesn’t decrease the loss.
def train(epochs, data, m, opt, loss_func):
for epoch in trange(epochs):
m.train()
data_iter = iter(data.trn_dl)
i, n = 0, len(data.trn_dl)
with tqdm(total=n) as pbar:
while i < n:
x, y = V(next(data_iter), requires_grad=True)
set_trainable(m, True)
m.zero_grad()
y_hat = m(x).detach()
loss = loss_func(y, y_hat)
loss.backward()
opt.step()
i += 1
print(f'Loss {to_np(loss)}')
model = SingleModel(Net().cuda()).model
train(5, data, model, optim.RMSprop(model.parameters(), lr=0.1), F.mse_loss)
By using the fit
function I checked if the crit, optim, data and the model are all good for training. The fit
function successfully reduced the loss of the NN.
model = SingleModel(Net().cuda()).model
learn = ConvLearner.from_model_data(model, data)
learn.crit = F.mse_loss
learn.optim = optim.RMSprop(model.parameters(), lr=0.1)
learn.fit(0.1, 5)
I have no idea what could be the problem as all the differences I see between what I do and the fit
function is:
- Detaching
y_hat
(I get an error if I don’t do this) - Setting the output of
next(data_iter)
to a variable with grad (I am not sure if the fit makes it have grad or not but without it, I get another error)
If what I am doing wrong is obvious then please just point me in the direction instead of just correcting me. Thanks!