It is amazing to read this incredible book! I’ve already learned a lot. May it be possible to use this Topic to discuss the first chapters of the book but also create a topic for each Chapter as the lesson will continue?
or maybe we could use this topic? https://forums.fast.ai/t/fastbook-chapter-1-questionnaire-solutions-wiki/65647/2
It would be great to have also the number in the different Paragraphs:
I am referring to Validation sets and test sets Paragraph > “We, as modellers, are evaluating the model by looking at predictions on the validation data, when we decide to explore new hyperparameter values! So subsequent versions of the model are, indirectly, shaped by having seen the validation data. Just as the automatic training process is in danger of overfitting the training data, we are in danger of overfitting the validation data, by human trial and error and exploration.”
Is it possible to automate also the exploration of Hyper-parameters?
Could AutoML help in this? Are H2O or others doing this? Do you have any references?
I’m unsure on that, @init_27 may be able to provide a better answer to that (from an H20 perspective)
Baysian optimization is affecting any hyperparameter you choose and want to modify. The idea as a whole is it operates on a grid search of sorts, where you define the bounds in which we search in. As we search this area for the best parameter, we grade each choice via some function (in most cases our metric). This has been used to automate model generation (Google did this I believe, and we can do this for tabular, I show this in my next walk with fastai2 lecture), but we can also use it for any hyperparameter we want, IE learning rate, weight decay, etc
When you use the fine_tune method, fastai will use these tricks for you. There are a few parameters you can set (which we’ll discuss later), but in the default form shown here, it does two steps:
Use one epoch to fit just those parts of the model necessary to get the new random head to work correctly with your dataset
Use the number of epochs requested when calling the method to fit the entire model, updating the weights of the later layers (especially the head) faster than the earlier layers (which, as we’ll see, generally don’t require many changes from the pretrained weights)
Why is 1 epoch enough to get the new head in order?
Would it be beneficial to use more epochs for the head before even changing the earlier layers?