Lesson 5 In-Class Discussion ✅

sgugger · November 20, 2018, 4:00am

It’s not always true though.

agoldina · November 20, 2018, 4:00am

Try blah and see if it works.

sajjura · November 20, 2018, 4:01am

What does wd stand for?

PierreO · November 20, 2018, 4:01am

Interesting. I’ll look into that, thanks for the answers !

Jaghachi · November 20, 2018, 4:01am

so is choosing the “right” batch size be a hyper parameter to help with accuracy?

sandeepsign · November 20, 2018, 4:02am

Is the Weight Decay same as Regularization Coefficient?

karthikramesh · November 20, 2018, 4:02am

@rachel No way to learn the WD, get an embedding for that maybe and learn it?

lesscomfortable · November 20, 2018, 4:03am

I don’t know what ‘regularization coefficient’ means but it probably is referring to weight decay. Weight decay is a specific regularization technique.

sgugger · November 20, 2018, 4:03am

It is another hyper-parameter yes. Although in most real-life applications, the higher you can fit on your GPU the better.

PierreO · November 20, 2018, 4:05am

“doing something in place” means rather that returning the value without updating the variable it’s updating the variable right ?

sgugger · November 20, 2018, 4:05am

Weight decay

jcatanza · November 20, 2018, 4:05am

I think you need to hold back data to train and test the Random Forest, because the entity embeddings have “seen” the training data.

ritika26 · November 20, 2018, 4:05am

weight decay=learning rate/No of epochs . But if are using multiple learning rate then how can we define range
of weight decay in fastai library

pjha · November 20, 2018, 4:05am

Is there an equivalent of gridsearch cross validation or any way to find the best walues for hyperparameters like wd, epochs etc?

sgugger · November 20, 2018, 4:05am

Exactly, it helps save memory when we can do it.

saadorj · November 20, 2018, 4:06am

Yes. The original variable will be overwritten with the new value after an in-place operation

sgugger · November 20, 2018, 4:07am

Where did you get that formula?

lesscomfortable · November 20, 2018, 4:07am

This is a whole area of research. You can check out this conversation of Leslie Smith and Jeremy last week where they discussed this specific problem.

Jaghachi · November 20, 2018, 4:07am

I see… and if it happens that I am able to fit my whole data set into the batch size, it means “get more data” or “get higher quality data” i guess?

jcatanza · November 20, 2018, 4:07am

We haven’t created features in the work we did in this class today