Why does adding a Sigmoid increase the loss of low learning rates?

Hi!
I’m on Lesson 5 of Part 1 of the course, and I have a question regarding the math of Linear models and neural networks.

While training my own Linear model with a calc_preds() function that did not involve using torch.sigmoid() the most stable learning rate I had was 0.03 that reached a loss of 0.27 after 100 epochs, starting from 0.92 in the first epoch.

When I added torch.sigmoid() in the calc_preds() function, the loss after 100 epochs of the same learning rate reached 0.47, starting from 0.52 in the first epoch.

Can anyone help me understand why this happened?