CHAPTER 1 Fastbook Discussion

Dear All,

It is amazing to read this incredible book! I’ve already learned a lot. May it be possible to use this Topic to discuss the first chapters of the book but also create a topic for each Chapter as the lesson will continue?
or maybe we could use this topic?

Thanks again @jeremy @rachel and team.

It would be great to have also the number in the different Paragraphs:

I am referring to Validation sets and test sets Paragraph
> “We, as modellers, are evaluating the model by looking at predictions on the validation data, when we decide to explore new hyperparameter values! So subsequent versions of the model are, indirectly, shaped by having seen the validation data. Just as the automatic training process is in danger of overfitting the training data, we are in danger of overfitting the validation data, by human trial and error and exploration.”

Is it possible to automate also the exploration of Hyper-parameters?
Could AutoML help in this? Are H2O or others doing this? Do you have any references?

Best Regards

Sure, bayesian optimization does exactly that

(which I have a few implementations of if you’d like examples)

  • Note: bayesian optimization is the technique, there’s a number of different ways people have gone about it

Thanks Zach, do you know if AutoML and H2O work in the same way as well?
The Bayesian Optimization what kind of Hyper-parameters is actually modifing?

I thought it could be done with Reinforcement Learning and RNN.

Thanks for the link

I’m unsure on that, @init_27 may be able to provide a better answer to that (from an H20 perspective)

Baysian optimization is affecting any hyperparameter you choose and want to modify. The idea as a whole is it operates on a grid search of sorts, where you define the bounds in which we search in. As we search this area for the best parameter, we grade each choice via some function (in most cases our metric). This has been used to automate model generation (Google did this I believe, and we can do this for tabular, I show this in my next walk with fastai2 lecture), but we can also use it for any hyperparameter we want, IE learning rate, weight decay, etc


Related: Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization, which is an optimized random search method.

When you have hyper-parameters that contribute to making predictions but don’t influence the training phase of your model.

1 validation run over 900 samples:

  • ~= 3 runs over 300 samples
  • ~= 9 runs over 100 samples
  • ~= 27 runs over 33 samples
  • ~= 1 run over 300 samples + 3 runs over 100 samples + 9 runs over 33 samples

→ Hyperband successive halving:

  • choose 9 random set of hyper-parameters and test them over 33 samples
  • pick 3 best runs and test them over 100 samples
  • pick best and run it over 300 samples
  • ~ same running time as testing a single set of hyper-parameters over 900 samples

See also:


When you use the fine_tune method, fastai will use these tricks for you. There are a few parameters you can set (which we’ll discuss later), but in the default form shown here, it does two steps:

  1. Use one epoch to fit just those parts of the model necessary to get the new random head to work correctly with your dataset
  2. Use the number of epochs requested when calling the method to fit the entire model, updating the weights of the later layers (especially the head) faster than the earlier layers (which, as we’ll see, generally don’t require many changes from the pretrained weights)

Why is 1 epoch enough to get the new head in order?
Would it be beneficial to use more epochs for the head before even changing the earlier layers?

An interesting visualisation of cpu vs gpu by Mckinsey. Link