Lesson 5 In-Class Discussion ✅

I’ve noticed that Bias isn’t used in the convolutional layers of a cnn e.g. Conv2d in Resnet. Wondering why?

2 Likes

This is advanced but the short answer is that BatchNorm plays the same role as bias, and in resnet there’s one after each conv layer.

7 Likes

Does all Databunch.create loads all the dataset into RAM and iterate through them one batch at a time through netx(iter(.. or does it load it from Hard drive with every epoch and every batch?

The difference in efficiency is big between the two methods, but not all dataset fit into RAM…

4 Likes

It depends on the underlying dataset. For images, they’re loaded from hard disk on the fly, when you request a batch.

2 Likes

Why torch.Size([64, 10])? where is the “64” ?from

The 64 is you batch size.

Whether or not you need more epochs with large batch sizes will depend on the problem. I’m just saying that for one epoch, you get more albeit noisier weight updates when the batch size is small, and fewer but more accurate weight updates when the batch size is large.

1 Like

How about if I have enough RAM? Can we make Databunch.create load all the dataset into RAM once and do all the batches and all the epochs?

Maybe I can circumvent that if its not possible in fastai, by using a RAM DISK! But is there a way to do that without a custom dataloader?

2 Likes

thank you for the follow up.

2 Likes

@sgugger, can you please explain this briefly, or point me to explanation? Thanks!

I’m going to take a guess and say it’s gpu ram not the regular ram, so there’s no circumventing unless you get a more powerful gpu. Am I right?

So what’s the point of a bigger architecture regularized over a smaller one that needs less regularization ?

1 Like

Why does forcing weights towards zero help prevent over fit?

1 Like

Ask this on the advanced topic, then. I’ll reply here.

1 Like

A bigger architecture properly regularized will probably generalize better.

1 Like

Thanks… is there any session covering this…?
Can this be also used for a single image also ?
fai provides any factory method to this kind of transformation ?

Interesting. Any intuition to understand why?

1 Like

What is the difference between Regularization and Weight Decay?

3 Likes

Weight Decay is one form of Regularization. There are others.

2 Likes

It helps increase generalization. This is because it increases the cost of making the value of weights larger. This means that the network will only use the weights (i.e. increase their value) if they are really necessary to learn the important features and will avoid learning details that are too specific since they are not worth the weight penalization.

10 Likes