Lesson 5 In-Class Discussion ✅

That’s not the same thing, it’s a parameter that reduces the learning rate over time.

2 Likes

A larger batch will yield more accurate (less noisy) weight updates. The tradeoff is you get fewer weight updates per epoch. On the other hand, a small batch will yield noisier updates, but you get more updates per epoch.

2 Likes

wd plays the same role in L1/L2 regularization norm. The difference is in the summation of the weights - which is multiplied by wd - (L1 -> sums absolute values of weights while L2 -> sums square values of weights).

1 Like

That’s learning rate decay, which is different from weigh decay. Learning rate decay is reducing the learning rate over several epochs. Weight decay affects the update of parameters during back propagation.

4 Likes

Okay. Thanks for clearing the difference

Is something like “differential weight decay” like we do for learning rates make any sense ?

1 Like

Definitely. Leslie Smith is working on it, please check my previous post’s link.

3 Likes

nice! this is the type of explanation i was looking for!

“fewer weight updates per epoch” thus needing more epochs, right? as was said by @sgugger ?

EDIT: punctuation

Thanks for clearing the difference between weight decay and learning rate decay

I’ve noticed that Bias isn’t used in the convolutional layers of a cnn e.g. Conv2d in Resnet. Wondering why?

2 Likes

This is advanced but the short answer is that BatchNorm plays the same role as bias, and in resnet there’s one after each conv layer.

7 Likes

Does all Databunch.create loads all the dataset into RAM and iterate through them one batch at a time through netx(iter(.. or does it load it from Hard drive with every epoch and every batch?

The difference in efficiency is big between the two methods, but not all dataset fit into RAM…

4 Likes

It depends on the underlying dataset. For images, they’re loaded from hard disk on the fly, when you request a batch.

2 Likes

Why torch.Size([64, 10])? where is the “64” ?from

The 64 is you batch size.

Whether or not you need more epochs with large batch sizes will depend on the problem. I’m just saying that for one epoch, you get more albeit noisier weight updates when the batch size is small, and fewer but more accurate weight updates when the batch size is large.

1 Like

How about if I have enough RAM? Can we make Databunch.create load all the dataset into RAM once and do all the batches and all the epochs?

Maybe I can circumvent that if its not possible in fastai, by using a RAM DISK! But is there a way to do that without a custom dataloader?

2 Likes

thank you for the follow up.

2 Likes

@sgugger, can you please explain this briefly, or point me to explanation? Thanks!

I’m going to take a guess and say it’s gpu ram not the regular ram, so there’s no circumventing unless you get a more powerful gpu. Am I right?