Discussion on Hyper-parameter Tuning

To many deep learning practitioners, hyper-parameter tuning is a key part to model performance. For example, it is critical for object detection model with numerous hyper-parameters such as Faster-RCNN (e.g. threshold on NMS, number of layers on FPN, scale and size of anchors … etc.)

But I noticed Fastai community doesn’t have much discussion on hyper-parameter tuning. From Rachel’s An Opinionated Introduction to AutoML and Neural Architecture Search, it seems the community is more inclined to remove hyper-parameters searching (by putting smarter default setup, e.g. learning rate finder).

I wonder if we generally need to search for hyper-parameters after those smart default setting. While Faster-RCNN is one example with numerous hyper-parameters, other generative models such as CycleGAN also have a lot of hyper-parameters (e.g. different weights on its loss components.)

From your personal experience on those models, do you find hyper-parameter tuning important even after learning rate finder?

2 Likes

Hi Alex,

If you want a more principled search rather than naive grid search over hyper-parameters, maybe take a look at this body of work on Bayesian optimization under inequality constraints by Weinberger et al:

http://proceedings.mlr.press/v32/gardner14.pdf

The team created a toolbox for PyTorch, I believe. Here’s a link to the first author’s webpage:

https://jacobrgardner.github.io/

Let me know if that was remotely useful. Hyper-parameter tuning the hard way is no fun.

Roger Grosse dedicated a lesson in his course to this topic, and is a well-regarded expert in Bayesian optimization: https://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/slides/lec21.pdf