It’s not always true though.
- Try blah and see if it works.
What does wd stand for?
Interesting. I’ll look into that, thanks for the answers !
so is choosing the “right” batch size be a hyper parameter to help with accuracy?
Is the Weight Decay same as Regularization Coefficient?
I don’t know what ‘regularization coefficient’ means but it probably is referring to weight decay. Weight decay is a specific regularization technique.
It is another hyper-parameter yes. Although in most real-life applications, the higher you can fit on your GPU the better.
“doing something in place” means rather that returning the value without updating the variable it’s updating the variable right ?
Weight decay
I think you need to hold back data to train and test the Random Forest, because the entity embeddings have “seen” the training data.
weight decay=learning rate/No of epochs . But if are using multiple learning rate then how can we define range
of weight decay in fastai library
Is there an equivalent of gridsearch cross validation or any way to find the best walues for hyperparameters like wd, epochs etc?
Exactly, it helps save memory when we can do it.
Yes. The original variable will be overwritten with the new value after an in-place operation
Where did you get that formula?
This is a whole area of research. You can check out this conversation of Leslie Smith and Jeremy last week where they discussed this specific problem.
I see… and if it happens that I am able to fit my whole data set into the batch size, it means “get more data” or “get higher quality data” i guess?
We haven’t created features in the work we did in this class today