Lesson 5 In-Class Discussion ✅

ramanan · November 20, 2018, 4:15am

I’ve noticed that Bias isn’t used in the convolutional layers of a cnn e.g. Conv2d in Resnet. Wondering why?

sgugger · November 20, 2018, 4:16am

This is advanced but the short answer is that BatchNorm plays the same role as bias, and in resnet there’s one after each conv layer.

hwasiti · November 20, 2018, 4:17am

Does all Databunch.create loads all the dataset into RAM and iterate through them one batch at a time through netx(iter(.. or does it load it from Hard drive with every epoch and every batch?

The difference in efficiency is big between the two methods, but not all dataset fit into RAM…

sgugger · November 20, 2018, 4:18am

It depends on the underlying dataset. For images, they’re loaded from hard disk on the fly, when you request a batch.

Xiwang · November 20, 2018, 4:18am

Why torch.Size([64, 10])? where is the “64” ?from

sgugger · November 20, 2018, 4:19am

The 64 is you batch size.

jcatanza · November 20, 2018, 4:20am

Whether or not you need more epochs with large batch sizes will depend on the problem. I’m just saying that for one epoch, you get more albeit noisier weight updates when the batch size is small, and fewer but more accurate weight updates when the batch size is large.

hwasiti · November 20, 2018, 4:21am

How about if I have enough RAM? Can we make Databunch.create load all the dataset into RAM once and do all the batches and all the epochs?

Maybe I can circumvent that if its not possible in fastai, by using a RAM DISK! But is there a way to do that without a custom dataloader?

Jaghachi · November 20, 2018, 4:21am

thank you for the follow up.

jcatanza · November 20, 2018, 4:24am

@sgugger, can you please explain this briefly, or point me to explanation? Thanks!

Jaghachi · November 20, 2018, 4:24am

I’m going to take a guess and say it’s gpu ram not the regular ram, so there’s no circumventing unless you get a more powerful gpu. Am I right?

PierreO · November 20, 2018, 4:25am

So what’s the point of a bigger architecture regularized over a smaller one that needs less regularization ?

source99 · November 20, 2018, 4:25am

Why does forcing weights towards zero help prevent over fit?

sgugger · November 20, 2018, 4:25am

Ask this on the advanced topic, then. I’ll reply here.

sgugger · November 20, 2018, 4:26am

A bigger architecture properly regularized will probably generalize better.

champs.jaideep · November 20, 2018, 4:26am

Thanks… is there any session covering this…?
Can this be also used for a single image also ?
fai provides any factory method to this kind of transformation ?

PierreO · November 20, 2018, 4:27am

Interesting. Any intuition to understand why?

nkkacirek · November 20, 2018, 4:27am

What is the difference between Regularization and Weight Decay?

sgugger · November 20, 2018, 4:27am

Weight Decay is one form of Regularization. There are others.

lesscomfortable · November 20, 2018, 4:27am

It helps increase generalization. This is because it increases the cost of making the value of weights larger. This means that the network will only use the weights (i.e. increase their value) if they are really necessary to learn the important features and will avoid learning details that are too specific since they are not worth the weight penalization.