Lesson 6 In-Class Discussion ✅

Taka · November 28, 2018, 3:30am

What exactly are we taking the exp weighted average of? The mean across all mini batches and using a single mean for batch norm??

vitaliy · November 28, 2018, 3:30am

rossman_data_clean does not work for me. it looks like it is using nb_008 which is a part of old fastai api

bholmer · November 28, 2018, 3:30am

Is L1 regularization ever used for deep NNs?

nbharatula · November 28, 2018, 3:31am

Is batch norm only used for continuous variables? Why?

wgpubs · November 28, 2018, 3:31am

What are the best practices for determining your embedding sizes for each categorical variable?

sgugger · November 28, 2018, 3:31am

No it’s used in pretty much every layer afterward if you look at the model.

lesscomfortable · November 28, 2018, 3:31am

This was answered before in this thread.

nextM · November 28, 2018, 3:31am

What can you do when your train loss stops decreasing? So you can’t get the model to overfit at all… do you change the batchnorm momentum, put less dropout, change weight decay?

KarlH · November 28, 2018, 3:31am

I don’t know about that. Typically with random forests when you make trees from random samples, the individual trees perform worse than a tree trained on the entire dataset. The benefit of bootstrapping trees is when you ensemble several of them at once. With a neural network you’re not training multiple models that you later ensemble together.

rachel · November 28, 2018, 3:32am

Jeremy was talking about Bernoulli random variables specifically in the context of drop-out, in which you are either using an activation (multiply by 1) or dropping it completely (multiply by 0). We aren’t using a Bernoulli for our category predictions.

jcatanza · November 28, 2018, 3:33am

For example if you are trying to predict an intercept (i.e. and offset) then absolute error might be more important. If you are trying to predict a slope (i.e a scale factor) then fractional error would be more important.

rohitr · November 28, 2018, 3:33am

Do we train on the original image as well as the augmented one?

sgugger · November 28, 2018, 3:35am

Usually the p_affine and p_lighting are not 1, so sometimes your images are the original ones.

tank13 · November 28, 2018, 3:35am

Is there any way in the docs (or elsewhere) to see what to do instead when certain commands / modules have been deprecated?

sgugger · November 28, 2018, 3:35am

I feel obligated to say I don’t like cats. I’m just shamelessly trying to get people to read our docs by putting kittens in them.

zachcaceres · November 28, 2018, 3:35am

What about using these kinds of data augmentation techniques on abstract images, like spectrograms?

Mauro · November 28, 2018, 3:35am

What do you mean by afterward?

rohitr · November 28, 2018, 3:36am

So if I train only one epoch, the network trains on only one version of the image (which could be an original or an augmented one)?

alephthoughts · November 28, 2018, 3:36am

use Pandas, everybody loves Pandas

Imad · November 28, 2018, 3:36am

Is there an roc_auc metric that we can pass to a Learner in case of an imbalanced classes in binary classification tasks?