I’m working on a paper with my coworker (hi @cnll0075!) and we’re designing a series of experiments to show the improvements that come from the new architecture we’ve worked on. Like @Woodstock we’re planning to use LR finder to find the optimal LR and weight decay rather than include it in the hyperparameter search. We’re planning to do the same for model size, embedding size as well on the premise that the size of model is less a hyperparameter and more of a model capacity concept and the important thing is to compare two models of the same (approximate) capacity.
I’m curious what the reaction of the academics are to this. I know hyperparameter search is often limited to LR and weight decay and there are always limitations in terms of what you can explore if you’re not openai/google. We’re trying to keep this to a reasonable search space so that we don’t need to run thousands of experiments.