Jeremy was talking about Bernoulli random variables specifically in the context of drop-out, in which you are either using an activation (multiply by 1) or dropping it completely (multiply by 0). We aren’t using a Bernoulli for our category predictions.
For example if you are trying to predict an intercept (i.e. and offset) then absolute error might be more important. If you are trying to predict a slope (i.e a scale factor) then fractional error would be more important.
Do we train on the original image as well as the augmented one?
Usually the p_affine and p_lighting are not 1, so sometimes your images are the original ones.
Is there any way in the docs (or elsewhere) to see what to do instead when certain commands / modules have been deprecated?
I feel obligated to say I don’t like cats. I’m just shamelessly trying to get people to read our docs by putting kittens in them.
What about using these kinds of data augmentation techniques on abstract images, like spectrograms?
What do you mean by afterward?
So if I train only one epoch, the network trains on only one version of the image (which could be an original or an augmented one)?
use Pandas, everybody loves Pandas
Is there an roc_auc metric that we can pass to a Learner
in case of an imbalanced classes in binary classification tasks?
After the first layer, there was a sequence of linear/ReLU/batchnorm.
They should be used conscientiously by the user, trying to replicate the real world (test set). Some types of augmentation are not useful on a specific dataset (e.g. a cat upside down).
Exactly.
what is padding, what is it used for?
Padding is adding a border around your image (which can be a reflection of the image). With or without padding will change the size of the next layer in a network.
Here, Jeremy has used padding to make the data augmentation clearer (e.g. how it was rotated or distorted).
why would you do that?