why is the animation slowing down when closer to correct line? like smaller steps?
It’s because your gradients become smaller and smaller, so the steps are smaller and smaller.
okay. Thats convenient
what is the best learning rate value to start with?
You can understand why the gradient gets smaller by looking at the curve Jeremy showed earlier (the U shaped one). The gradient is basically the slope of that curve, and the closer you get to the minimum, the closer to 0 the slope becomes.
I was asked the same question.
does it make a difference (final result) if I choose mini-batches randomly vs sequentially in some order, say 0…31, 32…63, etc?
It does: during training it is best to have random indexes to build your mini-batches, this will avoid the model learning the answers in order.
How many is “too many” epochs?
initially 3e-3
There is no way to know beforehand. That’s why it is good to train for some epochs and see if the validation loss is still decreasing. If the loss start to improve very little, you should stop training.
Jeremy gave some elements to answer that earlier, but sadly there’s no good answer to that question.
How is learning answers in order possible?
It’s more like an image. But the model will lean better with shuffled images fed to him, this has been proven experimentally.
When we update the parameters in the with torch.to_grad()
step, does that also zero the gradients before the next loss calculation?
anyone else experiencing “! sudo add-apt-repository ppa:mc3man/trusty-media” failing?
Are all batch grabs within an epoch without replacement.Means in all cases will each batch be absolutely different than the last?
In one epoch yes, no batches are repeated.