Lesson 6 In-Class Discussion ✅

sgugger · November 28, 2018, 3:36am

After the first layer, there was a sequence of linear/ReLU/batchnorm.

KevinB · November 28, 2018, 3:37am

jcatanza · November 28, 2018, 3:37am

@Jeremy’s rule of thumb is something like N_embedding = min( (Ncategories +1 )/2, 50)

lesscomfortable · November 28, 2018, 3:37am

They should be used conscientiously by the user, trying to replicate the real world (test set). Some types of augmentation are not useful on a specific dataset (e.g. a cat upside down).

sgugger · November 28, 2018, 3:37am

Exactly.

miwojc · November 28, 2018, 3:37am

what is padding, what is it used for?

karthikramesh · November 28, 2018, 3:37am

@rachel Is there any data augmentation schemes for text data?

rachel · November 28, 2018, 3:38am

Padding is adding a border around your image (which can be a reflection of the image). With or without padding will change the size of the next layer in a network.

Here, Jeremy has used padding to make the data augmentation clearer (e.g. how it was rotated or distorted).

miwojc · November 28, 2018, 3:38am

why would you do that?

gamino · November 28, 2018, 3:39am

Is data augmentation needed because of how CNNs work? Would using some other architecture or technique else for image classification would reduce the need for this?

OCData_nerd · November 28, 2018, 3:39am

By default, about how many augmentations are applied to a given training image?

rachel · November 28, 2018, 3:40am

Data augmentation is a form of regularization. It is a way of giving you a larger data set than you actually have (by creating new images). It helps prevent overfitting.

PierreO · November 28, 2018, 3:40am

So when using data augmentation you should do more epochs in order for the model to see every image + augmented images ?

dotkay · November 28, 2018, 3:40am

I think data augmentation is just to get more data points to overcome overfitting. Experts can correct me if I am wrong.

jcatanza · November 28, 2018, 3:40am

That’s true. But Jeremy used the subsampling technique in the model-building stage to fit models faster; I would expect that applying dropout to inputs would also speed things up in the same way.

sgugger · November 28, 2018, 3:40am

Another way of saying it is that if you’re only doing one epoch, data augmentation is useless.

karthikramesh · November 28, 2018, 3:41am

@rachel Is data augmentation used because of the variations in images you may get during inference?

gamino · November 28, 2018, 3:41am

Yes, I understand that, thanks. My question is more about could we use something else to classify images that is not CNN that does not require data augmentation? Maybe this is an area of research, just wondering. Thanks!

mkolodny · November 28, 2018, 3:41am

You just use a different loss function (mse instead of softmax cross entropy)

gauss256 · November 28, 2018, 3:42am

When using perspective augmentation, would it make sense to fill in the black areas that are created at the edges by using mirroring?