After the first layer, there was a sequence of linear/ReLU/batchnorm.
@Jeremy’s rule of thumb is something like N_embedding = min( (Ncategories +1 )/2, 50)
They should be used conscientiously by the user, trying to replicate the real world (test set). Some types of augmentation are not useful on a specific dataset (e.g. a cat upside down).
what is padding, what is it used for?
@rachel Is there any data augmentation schemes for text data?
Padding is adding a border around your image (which can be a reflection of the image). With or without padding will change the size of the next layer in a network.
Here, Jeremy has used padding to make the data augmentation clearer (e.g. how it was rotated or distorted).
why would you do that?
Is data augmentation needed because of how CNNs work? Would using some other architecture or technique else for image classification would reduce the need for this?
By default, about how many augmentations are applied to a given training image?
Data augmentation is a form of regularization. It is a way of giving you a larger data set than you actually have (by creating new images). It helps prevent overfitting.
So when using data augmentation you should do more epochs in order for the model to see every image + augmented images ?
I think data augmentation is just to get more data points to overcome overfitting. Experts can correct me if I am wrong.
That’s true. But Jeremy used the subsampling technique in the model-building stage to fit models faster; I would expect that applying dropout to inputs would also speed things up in the same way.
Another way of saying it is that if you’re only doing one epoch, data augmentation is useless.
@rachel Is data augmentation used because of the variations in images you may get during inference?
Yes, I understand that, thanks. My question is more about could we use something else to classify images that is not CNN that does not require data augmentation? Maybe this is an area of research, just wondering. Thanks!
You just use a different loss function (mse instead of softmax cross entropy)
When using perspective augmentation, would it make sense to fill in the black areas that are created at the edges by using mirroring?