I already started. Thanks!

@jeremy:

`n_skip`

is crystal clear now. Furthermore, your advice of selecting a smaller batch size (16) worked. Indeed:

Note that one has to select values around 10^-2 or a bit smaller to obtain best results. If you select 10^-1 the SGD **does not converge at all**.

Why’s that? In the end, with 10^-1 we chose a LR where the loss is still rapidly decreasing.

### Coming to the main issue: Improving the LR finder.

I’m looking at how `lr_find`

and `LR_Finder`

are intermingled. `sgdr`

is very interesting, but quite complex, with all those callbacks and classes passing themselves all around. Let us leave `LR_Finder`

alone for a moment, and focus on getting the finder run more epochs in the simplest manner.

A rather dull method could be the following: add one more parameter and a for loop inside `lr_find`

, for example:

```
def lr_find(self, start_lr=1e-5, end_lr=10, wds=None, linear=False, nepochs=1):
self.save('tmp')
for i in np.arange(nepochs):
layer_opt = self.get_layer_opt(start_lr, wds)
self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr, linear=linear)
self.fit_gen(self.model, self.data, layer_opt, 1)
self.load('tmp')
```

This is obviously wrong, since it doesn’t run more epochs: it just starts from the beginning every time.

So I decided to cheat. To what function we passed the number of epochs before? To `fit()`

. Thus, I went looking about how `fit()`

does more epochs at once, to use it as a model.

It turns out that the argument “epochs” corresponds to `n_cycles`

parameter. The only place into `fit()`

where it gets used is when it calls `fit_gen()`

. **Plus, **`fit_gen()`

is called by `lr_find()`

immediately after `LR_Finder`

. I thought I nailed it, but I was wrong.

Now, `fit_gen()`

is quite complex on itself (and, er…, not much commented…), but I think you get it to do N epochs by specifying `n_cycle=N`

. At least, we do that when we call `fit()`

.

So, I did this:

```
def lr_find(self, start_lr=1e-5, end_lr=10, wds=None, linear=False, nepochs=1):
self.save('tmp')
layer_opt = self.get_layer_opt(start_lr, wds)
self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr, linear=linear)
self.fit_gen(self.model, self.data, layer_opt, nepochs)
self.load('tmp')
```

I just passed `n_cycle`

, which you fixed at 1, as a new argument, `nepochs`

.

But that darn contraption does hang immediately after completing the first epoch:

**I’m having difficulties in making sense of that behaviour**. I mean, both `fit()`

and `lr_find`

now call `fit_gen()`

with the same arguments, that is:

`self.model, self.data, layer_opt, n_cycle`

Why does the latter hang whilst the former doesn’t? I’m flabbergasted.

I think I need one hint or two.