I wonder if there is a difference running `fit_one_cycle(5, lr)`

a second time to running once `fit_one_cycle(10, lr)`

?

As the name suggests, the `fit_one_cycle`

function will make a cycle of one learning rate increase phase followed by a decreasing phase. By running the function twice you will thus make two cycles. It is then called **Stochastic Gradient Descent with warm Restarts** and was the technique used in fastai over the last years. But a single cycle appears to give better results and is now the go-to approach for training a network.

There is a page about SGDR in the docs (https://docs.fast.ai/callbacks.general_sched.html), maybe you could try it and see if you get any difference.

hi Nathan:

In case I train with three cycles : learn.fit_one_cycle(3,max_lr=slice(1e-5,1e-4)) .

After training I see the train loss > valid_loss and I want to train more after the 3 previous cycles , so in this case I have to save the earlier model and train again or calling learn.fit_one_cycle with some parameters can restart the previous execution of learn.fit ?

Thanks

Hi,

If you call `learn.fit_one_cycle`

several times, the training won’t restart at each execution but will continue.

Hope it helps

hi ,

In case I do an unfreeze before learn.fit_one_cycle, will it still continue learning from the place it left or since unfreeze make the model learn the initial layers it discards the previous learning and do it from fresh .

Yes, of course. The only thing that will change when **unfreezing** a model is that now, the bottom layers will be updated too, allowing your model to fit even better your dataset and leading to better performances.

The IS a difference between running `fit_one_cycle(5, lr)`

a second time to running once `fit_one_cycle(10, lr)`

In my cases running `fit_one_cycle(10, lr)`

once helped more. And there is a way to continue the same one_cycle (when for ex you have to reboot your PC or want train for several nights in a row) described here (you just set in `cyc_len`

number of cycles in the current session, `start_epoch`

is from which epoch you restart and `tot_epochs`

is a total number of epochs you want to train across all the sessions).

I’ve tried it and it worked very good for me (except some visual bugs)

Restarting does not seem to work for me. To test this I first ran `learn.fit_one_cycle(3, max_lr=1e-2)`

and then to compare:

```
learn.fit_one_cycle(1, max_lr=1e-2, tot_epochs=3)
learn.fit_one_cycle(1, max_lr=1e-2, tot_epochs=3, start_epoch=1)
learn.fit_one_cycle(1, max_lr=1e-2, tot_epochs=3, start_epoch=2)
```

which does not train but produces the following error instead:

```
/usr/local/lib/python3.6/dist-packages/fastprogress/fastprogress.py:102: UserWarning: Your generator is empty.
warn("Your generator is empty.")
```

Any help would be much appreciated.

I have an elaboration of this same question here:

I am using `fit_one_cycle`

on my Inception learner. That has the following architecture:

```
InceptionTime(
(inceptionblock): InceptionBlock(
(inception): ModuleList(
(0): InceptionModule(
(convs): ModuleList(
(0): Conv1d(1, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(1, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(1, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(1, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
(1): InceptionModule(
(bottleneck): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
(convs): ModuleList(
(0): Conv1d(32, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(32, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(32, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
(2): InceptionModule(
(bottleneck): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
(convs): ModuleList(
(0): Conv1d(32, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(32, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(32, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
(3): InceptionModule(
(bottleneck): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
(convs): ModuleList(
(0): Conv1d(32, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(32, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(32, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
(4): InceptionModule(
(bottleneck): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
(convs): ModuleList(
(0): Conv1d(32, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(32, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(32, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
(5): InceptionModule(
(bottleneck): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
(convs): ModuleList(
(0): Conv1d(32, 32, kernel_size=(39,), stride=(1,), padding=(19,), bias=False)
(1): Conv1d(32, 32, kernel_size=(19,), stride=(1,), padding=(9,), bias=False)
(2): Conv1d(32, 32, kernel_size=(9,), stride=(1,), padding=(4,), bias=False)
)
(maxconvpool): Sequential(
(0): MaxPool1d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
(1): Conv1d(128, 32, kernel_size=(1,), stride=(1,), bias=False)
)
(concat): Concat(dim=1)
(bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act): ReLU()
)
)
(shortcut): ModuleList(
(0): ConvBlock(
(0): Conv1d(1, 128, kernel_size=(1,), stride=(1,), bias=False)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(add): Add
(act): ReLU()
)
(gap): GAP1d(
(gap): AdaptiveAvgPool1d(output_size=1)
(flatten): Flatten(full=False)
)
(fc): Linear(in_features=128, out_features=1, bias=True)
)
```

I have been doing a manual exploration of different training strategies. The first observation I have is that if I create the learner and save the initial weights like this:

```
model = InceptionTime(dls.vars, dls.c)
learn = Learner(dls, model, metrics=[mae, mse], cbs=WandbCallback())
learn.save('0epochs')
```

I am finding that if I run

```
# 50 epochs
learn = learn.load('0epochs')
learn.fit_one_cycle(50, lr_max=1e-4, div=50.,)
learn.lr_find(stop_div=False)
```

In the 50th epoch, I get a training error of 15855.080078 and a MSE of 16197.121094.

Then I do the exact same thing again.

```
# 50 epochs
learn = learn.load('0epochs')
learn.fit_one_cycle(50, lr_max=1e-4, div=50.,)
learn.lr_find(stop_div=False)
```

In the 50th epoch I get a training error of 15845.706055 and an MSE of 16162.488281.

These metrics are different. Additionally, I am finding that there is also a bit of difference in the lr_find charts. Not much, but it’s there.

I am also trying to run `fit_one_cycle`

multiple times with the same parameters to get 50 epochs, for example:

```
learn.fit_one_cycle(25, lr_max=1e-4, div=50.,)
learn.fit_one_cycle(25, lr_max=1e-4, div=50.,)
```

In the 50th epoch I get a training error of 15855.791992 and an MSE of 16139.482422.

My question is:

What is causing the differences here given that I am using the same initial weights for each approach? And more of an implementation question, in this dimension, is there a difference between calling

`fit_one_cycle`

5 times with 10 epochs vs one time with 50 epochs?

Hi Aaron,

The content of minibatches will differ between runs because they are drawn randomly from the dataset. That would account for the small discrepancy of loss and error.

As far as I recall, fit_one_cycle applies the learning rate scheduler one time across the number of epochs specified. So yes there is a difference between one cycle at fifty epochs and five cycles at ten epochs.

Good observation and questions!

HTH,