Yes, it is quite fascinating. I would assume that Jeremy collected tabular data of all his experiments by noting down all the hyperparameter values chosen and the final metric/score achieved in each case. He would have then applied a random forest model on this data to calculate a partial dependence plot for the lower layer learning rates in order to get the best score.
That’s my best guess anyway! Maybe @muellerzr or @sgugger could provide more info and insight on this!
It would be extremely immense if we could see this data that Jeremy worked on!