That’s not the same thing, it’s a parameter that reduces the learning rate over time.
A larger batch will yield more accurate (less noisy) weight updates. The tradeoff is you get fewer weight updates per epoch. On the other hand, a small batch will yield noisier updates, but you get more updates per epoch.
wd plays the same role in L1/L2 regularization norm. The difference is in the summation of the weights - which is multiplied by wd - (L1 -> sums absolute values of weights while L2 -> sums square values of weights).
That’s learning rate decay, which is different from weigh decay. Learning rate decay is reducing the learning rate over several epochs. Weight decay affects the update of parameters during back propagation.
Okay. Thanks for clearing the difference
Is something like “differential weight decay” like we do for learning rates make any sense ?
Definitely. Leslie Smith is working on it, please check my previous post’s link.
nice! this is the type of explanation i was looking for!
“fewer weight updates per epoch” thus needing more epochs, right? as was said by @sgugger ?
EDIT: punctuation
Thanks for clearing the difference between weight decay and learning rate decay
I’ve noticed that Bias isn’t used in the convolutional layers of a cnn e.g. Conv2d in Resnet. Wondering why?
This is advanced but the short answer is that BatchNorm plays the same role as bias, and in resnet there’s one after each conv layer.
Does all Databunch.create loads all the dataset into RAM and iterate through them one batch at a time through netx(iter(..
or does it load it from Hard drive with every epoch and every batch?
The difference in efficiency is big between the two methods, but not all dataset fit into RAM…
It depends on the underlying dataset. For images, they’re loaded from hard disk on the fly, when you request a batch.
Why torch.Size([64, 10])? where is the “64” ?from
The 64 is you batch size.
Whether or not you need more epochs with large batch sizes will depend on the problem. I’m just saying that for one epoch, you get more albeit noisier weight updates when the batch size is small, and fewer but more accurate weight updates when the batch size is large.
How about if I have enough RAM? Can we make Databunch.create load all the dataset into RAM once and do all the batches and all the epochs?
Maybe I can circumvent that if its not possible in fastai, by using a RAM DISK! But is there a way to do that without a custom dataloader?
thank you for the follow up.
I’m going to take a guess and say it’s gpu ram not the regular ram, so there’s no circumventing unless you get a more powerful gpu. Am I right?