Basic questions about lesson 1

What I’m trying to do is to create a CNN to distinguish between audio clips from Pearl Jam and The National. The first thing I did was download 23 songs by Pearl Jam and 24 by The National, extract a 10 second clip (starting at 00:30) from each one and generating a spectrogram for each clip (standing on the shoulders of giants here). Then I placed 16 images in the training dataset for each band and the rest in the validation dataset. Now, some things I didn’t expect or don’t understand.

First, running this code

learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.lr_find()
learn.recorder.plot()

Repeatedly yields different results despite having the same seed.

lr-run2 lr-run1

Second, running

lr=1e-2
learn.fit_one_cycle(8, slice(lr))

Also yields different results when run repeatedly. I assume this is because I’m not starting off the model from scratch, but then is there any way to get rid of all the learning and start over without recreating the learn object?

there’s some pytorch random seed goodies under the hood as well as numpy. try adding the following where you’re setting the seed… and don’t forget to recreate your databunch when you recreate the model. there’s seed dependencies in there too. good luck mate.

torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(42)

2 Likes

Thanks! That seems to work. Although I find it strange that it wasn’t mentioned in the lecture. If I don’t use your code the result are not repeatable.

The other issue I’m having now is that I can’t figure out why I should choose a specific value for the first parameter in learn.fit_one_cycle(5, slice(1e-2)). Why 4 or 5 or some other value? Also, the error rate in the last epoch for that is 0.25. Subsequently running

learn.unfreeze()
learn.fit_one_cycle(10, max_lr=slice(1e-3, 1e-2))

Gives me:

epoch train_loss valid_loss error_rate time
0 0.101125 0.654265 0.187500 00:00
1 0.202470 2.742486 0.500000 00:00
2 0.779277 1.481577 0.375000 00:00
3 0.940830 4.350643 0.375000 00:00
4 1.469711 69.141212 0.250000 00:00
5 1.325877 748.460144 0.500000 00:00
6 1.145573 933.605164 0.500000 00:00
7 1.075393 671.214417 0.500000 00:00
8 0.962771 347.121521 0.500000 00:00
9 0.878943 171.454285 0.500000 00:00

Is there any way to pick a good number of epochs? Why 10 and not 20 or 100?

your’e in luck. lesson 2 talks about picking the number of epochs at length ! :slightly_smiling_face:

i also find this cheat sheet from stanford a handy reference at times
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

2 Likes

A specific number of epochs is not that important when you are using fit_one_cycle because of the shape of learning rate function (look here)
It’s decreasing in later epochs, so few more epochs will not have a big impact and should not cause overfitting.

Ofc if you choose a very high number (eg. 100), the network will convergence very slow.
And for a too low number, the network will not be able to learn. Something between 5-20 should be ok.

If you think about it more, the right num of epochs depends on the size of the network, batch size, regularization, learning rate and other hyper-params. I didn’t saw a specific prescription

There are two handy callbacks in fastai EarlyStopping and ReduceLROnPlateau to increase safely number of epochs. But I think they work better with fit function (with constant lr).

I do something like this:

  1. fit_one_cycle freezed
  2. fit_one_cycle unfreezed
  3. fit with previous epoch lr (taken from learn.opt.lr), with 2 times more epochs, and both callbacks

First step has a huge impact on loss.
The third step decreases loss only slightly

1 Like

I think that perhaps you need more data, IMO your model is overfitting to the few training instances you have quite quickly, that’s why you went from a 18% error rate all the way down to a 50% error rate which is basically just flipping a coin to choose between the two artists.

1 Like

Forgive me if I’m wrong but isn’t fit_one_cycle supposed to never overfit?

Never overfit? I believe that you have the wrong idea.

Heres what I know please do correct me if I’m wrong.

From Jeremy what I got was that we use fit_one_cycle so that we do not have to tune the hyperparameters.

At least that is how I understood it. If that is not the case then what is the benifit of fit_one_cycle? Sorry if this question is too stupid.

Sylvain wrote an article on it, perhaps you’ll find this enlightening.

1 Like

Gee Thanks. I got it comletly wrong.