Lesson 6 In-Class Discussion ✅

What exactly are we taking the exp weighted average of? The mean across all mini batches and using a single mean for batch norm??

rossman_data_clean does not work for me. it looks like it is using nb_008 which is a part of old fastai api

2 Likes

Is L1 regularization ever used for deep NNs?

1 Like

Is batch norm only used for continuous variables? Why?

1 Like

What are the best practices for determining your embedding sizes for each categorical variable?

2 Likes

No it’s used in pretty much every layer afterward if you look at the model.

This was answered before in this thread.

1 Like

What can you do when your train loss stops decreasing? So you can’t get the model to overfit at all… do you change the batchnorm momentum, put less dropout, change weight decay?

1 Like

I don’t know about that. Typically with random forests when you make trees from random samples, the individual trees perform worse than a tree trained on the entire dataset. The benefit of bootstrapping trees is when you ensemble several of them at once. With a neural network you’re not training multiple models that you later ensemble together.

2 Likes

Jeremy was talking about Bernoulli random variables specifically in the context of drop-out, in which you are either using an activation (multiply by 1) or dropping it completely (multiply by 0). We aren’t using a Bernoulli for our category predictions.

1 Like

For example if you are trying to predict an intercept (i.e. and offset) then absolute error might be more important. If you are trying to predict a slope (i.e a scale factor) then fractional error would be more important.

1 Like

Do we train on the original image as well as the augmented one?

4 Likes

Usually the p_affine and p_lighting are not 1, so sometimes your images are the original ones.

2 Likes

Is there any way in the docs (or elsewhere) to see what to do instead when certain commands / modules have been deprecated?

1 Like

I feel obligated to say I don’t like cats. I’m just shamelessly trying to get people to read our docs by putting kittens in them.

23 Likes

What about using these kinds of data augmentation techniques on abstract images, like spectrograms?

5 Likes

What do you mean by afterward?

So if I train only one epoch, the network trains on only one version of the image (which could be an original or an augmented one)?

use Pandas, everybody loves Pandas :smiley:

2 Likes

Is there an roc_auc metric that we can pass to a Learner in case of an imbalanced classes in binary classification tasks?

1 Like