What exactly are we taking the exp weighted average of? The mean across all mini batches and using a single mean for batch norm??
rossman_data_clean does not work for me. it looks like it is using nb_008 which is a part of old fastai api
Is L1 regularization ever used for deep NNs?
Is batch norm only used for continuous variables? Why?
What are the best practices for determining your embedding sizes for each categorical variable?
No it’s used in pretty much every layer afterward if you look at the model.
This was answered before in this thread.
What can you do when your train loss stops decreasing? So you can’t get the model to overfit at all… do you change the batchnorm momentum, put less dropout, change weight decay?
I don’t know about that. Typically with random forests when you make trees from random samples, the individual trees perform worse than a tree trained on the entire dataset. The benefit of bootstrapping trees is when you ensemble several of them at once. With a neural network you’re not training multiple models that you later ensemble together.
Jeremy was talking about Bernoulli random variables specifically in the context of drop-out, in which you are either using an activation (multiply by 1) or dropping it completely (multiply by 0). We aren’t using a Bernoulli for our category predictions.
For example if you are trying to predict an intercept (i.e. and offset) then absolute error might be more important. If you are trying to predict a slope (i.e a scale factor) then fractional error would be more important.
Do we train on the original image as well as the augmented one?
Usually the p_affine and p_lighting are not 1, so sometimes your images are the original ones.
Is there any way in the docs (or elsewhere) to see what to do instead when certain commands / modules have been deprecated?
I feel obligated to say I don’t like cats. I’m just shamelessly trying to get people to read our docs by putting kittens in them.
What about using these kinds of data augmentation techniques on abstract images, like spectrograms?
What do you mean by afterward?
So if I train only one epoch, the network trains on only one version of the image (which could be an original or an augmented one)?
use Pandas, everybody loves Pandas
Is there an roc_auc metric that we can pass to a Learner
in case of an imbalanced classes in binary classification tasks?