Automated Learning Rate Suggester

Amazing work, I’ve been looking for this for a while ! Will test to see how it reacts to different datasets when I can

2 Likes

Also test it on models that have been trained just after you unfreeze: those generally have a widly different shape.

2 Likes

That’s a good idea. This method generalizes by working with the huge increase in loss as the learning rate approaches 1.0+, but I’d like to see how it works on models that produce wild shapes for potential next steps and improvements.

@jeremyeast thank you! That’s great, keep us updated on how it works.

Hi @aychang,
Awesome tool! I am trying it in ResNets34/50 for images and seems to work nicely.
However, I am wondering how to tune the function in order to get the lr values if you want to pass a slice function in the LR argument.
I guess the lr_to_use may be good enough for the second part of the slice but, what about the first part? It should be something just before the gradient increases. In your example should be something like 1e-2.

Any idea how to get it? Thanks!

2 Likes

Hi @Joan,
I’m glad it seems to be working OK on the ResNet models!

You could try increasing the lr_diff parameter as it would increase the width of the slide rule and should theoretically provide a learning rate closer to the point right before loss decreases. This is also giving me some ideas on some future improvements as well, so thanks and let me know how that works!

2 Likes

Hi @aychang,

I am trying different lr_diff and seems that this is quite specific for every dataset and I cannot find a way to generalize nicely. However 40-45 seems to be a good start but I have to run more test.

Regarding this, I am trying to get reproducible results using the function described here. However, when I run the code using num_workers = 0 when generating the DataBunch I got an error:

Traceback (most recent call last): File "/users/genomics/jgibert/Scripts/Lymphoma_Fastai_Neptune.py", line 63, in <module> selected_lr = find_appropriate_lr(learn) File "/users/genomics/jgibert/Scripts/Lymphoma_Fastai_Neptune.py", line 40, in find_appropriate_lr model.lr_find() File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/train.py", line 32, in lr_find learn.fit(epochs, start_lr, callbacks=[cb], wd=wd) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/basic_train.py", line 196, in fit fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/basic_train.py", line 111, in fit finally: cb_handler.on_train_end(exception) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/callback.py", line 322, in on_train_end self('train_end', exception=exception) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/callback.py", line 250, in __call__ for cb in self.callbacks: self._call_and_update(cb, cb_name, **kwargs) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/callback.py", line 240, in _call_and_update new = ifnone(getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs), dict()) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/callbacks/lr_finder.py", line 40, in on_train_end self.learn.load('tmp', purge=False) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/fastai/basic_train.py", line 265, in load state = torch.load(source, map_location=device) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/torch/serialization.py", line 368, in load return _load(f, map_location, pickle_module) File "/soft/EB_repo/devel/programs/goolf/1.7.20/Python/3.6.2/lib/python3.6/site-packages/torch/serialization.py", line 549, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly) RuntimeError: storage has wrong size: expected 4355518534081521830 got 2048

I am not quite sure why is this happening, I check some post and seems to be related with serialization. Any idea why is this happening?

Thanks!

2 Likes

An update on the error:

It seems that adding the random_seed(42) before DataBunch function (instead on adding it before learner function) does not raise any error. I am quite surprised of this behavior but this single change seems to solve the problem.

3 Likes

Hi Andrew, quick feedback to let you know I’ve been using the AutomatedLearnRateFinder and its been a little bit agressive on very imbalanced datasets, so I had to significantly increase the lr_diff parameter.

2 Likes

@Joan
Hi Joan, thanks for the update and for bringing this up. I’ll be trying to replicate the issue, and let me know if you get any headway on this as well.

@jeremyeast
Thanks for quick feedback. And yes, the default value may produce a high learning rate depending on how the quickly the loss gradients react to the increasing learning rate. I’ve been looking into an optimized lf_diff based off various model attributes such as weight dimension size etc. Feel free to post graphs or results of the lr you’re getting with the increased lr_diff parameter. Thanks again!

2 Likes

You are incredible Aychang, it works perfectly, thank you very much :slight_smile:

2 Likes

That’s awesome to hear, if you have any feedback or questions regarding any issues please let me know

2 Likes

Hi @aychang this looks promising!

I’ve gotten the following plots with the lr finder before. Do you think your suggester would be able to handle finding a good lr in these particular plots?

Fig1 (no clear point where loss shoots up)::

Fig2 (also no clear point where loss shoots up):

Fig3 (Really variable):

2 Likes

Hi @adeperio, I’m glad you came across this!

I believe the suggester would be able to provide a good learning rate for the learners showing all three figures.

For Fig1 and Fig2, it’s true that there are no clear points where the loss shoots up, but if you displayed the plot where the x-axis (learning rate) went up to 1e+01 or even 1e+0 the loss should still shoot up and that would also reflect in the gradient plot.

As for the learner that’s showing the Fig3 plot, a plot we’d expect from the later training/fine-tuning of the model, the suggester relies on the gradients of the losses in respect to the learning rates so the suggester should be reasonably robust against the erratic nature of Fig3.

These are only my thoughts however, so I’d be interested to actually see how the suggester works on these plots. Good luck and feel free to let us know how it works out when you get the chance!

2 Likes

Hi @aychang So I’ve been using your autolr finder now for a few days and it seems to be working pretty well! I don’t rely on it completely (just for prudence) but it is definitely a great help. I don’t use it for unfrozen learning rate finder runs (I haven’t had time to test this out much), but for frozen runs it perform pretty consistently.

I think it’s worth spending time tuning lr_diff. I’ve had to lower that value to 5 to get the results I need.

But anyway, nice work!

2 Likes

@adeperio ah that’s interesting, with such a low lr_diff I wonder if your optimal learning rate is really high or your lr finder has more of a hair pin change at the end. It’s cool to see how the finder reacts with your adjustments as well.

I also agree, good prudence is good practice. I’m glad the finder is helpful and seems to be working well for you.

Thanks!

Hi @aychang

Yep I’m just finishing up some experiments and I am noticing some LR plots that have more of a hairpin style shape.

Kinda like this (using resnet34, 128px, with all drop out, wd, and augmentations off)

Yea I’m using it at the moment when I’m running my experiments (ie when bench marking certain hyper parameters and setups) so that I can have a consistent LR finding procedure and so that I can fire off a bunch of experiments in one go and come back to them later.

When I have settled on a set of hyper params I then try and do manual LR finding and compare that with the auto LR.

Do you think that could be a good approach with how to use the auto LR finder?

2 Likes

@adeperio I think you’ve outlined a perfect use case for this automated lr finder/suggester.

Using the finder to streamline the process of shooting off experiments to get some empirical results from hyperparameter adjustment is a great automated way to optimize. From there, manually setting the LR for the hyperparams you’ve settled on and comparing it to the auto LR is a also a great approach in my opinion

2 Likes

Yea I think that seems like a possible good approach moving forward. Will keep using the finder I think, it’s performing well so far and I intermittently compare it to a manual LR find once in a while for checking.

2 Likes

Awesome thanks for the feedback, and let me know if there’s anything that comes up or I can help with.

Good luck!

1 Like

@aychang - Just came across this thread and am wondering if the function posted back on April 20 is still the most current version?

Also, has this more automated learning rate finder been added to the fastai library?

1 Like