Lesson 5 In-Class Discussion ✅

It’s not always true though.

  1. Try blah and see if it works.
6 Likes

What does wd stand for?

Interesting. I’ll look into that, thanks for the answers !

so is choosing the “right” batch size be a hyper parameter to help with accuracy?

1 Like

Is the Weight Decay same as Regularization Coefficient?

1 Like

@rachel No way to learn the WD, get an embedding for that maybe and learn it?

1 Like

I don’t know what ‘regularization coefficient’ means but it probably is referring to weight decay. Weight decay is a specific regularization technique.

1 Like

It is another hyper-parameter yes. Although in most real-life applications, the higher you can fit on your GPU the better.

2 Likes

“doing something in place” means rather that returning the value without updating the variable it’s updating the variable right ?

Weight decay

1 Like

I think you need to hold back data to train and test the Random Forest, because the entity embeddings have “seen” the training data.

weight decay=learning rate/No of epochs . But if are using multiple learning rate then how can we define range
of weight decay in fastai library

1 Like

Is there an equivalent of gridsearch cross validation or any way to find the best walues for hyperparameters like wd, epochs etc?

2 Likes

Exactly, it helps save memory when we can do it.

1 Like

Yes. The original variable will be overwritten with the new value after an in-place operation

1 Like

Where did you get that formula?

1 Like

This is a whole area of research. You can check out this conversation of Leslie Smith and Jeremy last week where they discussed this specific problem.

3 Likes

I see… and if it happens that I am able to fit my whole data set into the batch size, it means “get more data” or “get higher quality data” i guess?

We haven’t created features in the work we did in this class today :wink: