What I’m trying to do is to create a CNN to distinguish between audio clips from Pearl Jam and The National. The first thing I did was download 23 songs by Pearl Jam and 24 by The National, extract a 10 second clip (starting at 00:30) from each one and generating a spectrogram for each clip (standing on the shoulders of giants here). Then I placed 16 images in the training dataset for each band and the rest in the validation dataset. Now, some things I didn’t expect or don’t understand.
First, running this code
learn = create_cnn(data, models.resnet34, metrics=error_rate)
Repeatedly yields different results despite having the same seed.
Also yields different results when run repeatedly. I assume this is because I’m not starting off the model from scratch, but then is there any way to get rid of all the learning and start over without recreating the
there’s some pytorch random seed goodies under the hood as well as numpy. try adding the following where you’re setting the seed… and don’t forget to recreate your databunch when you recreate the model. there’s seed dependencies in there too. good luck mate.
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Thanks! That seems to work. Although I find it strange that it wasn’t mentioned in the lecture. If I don’t use your code the result are not repeatable.
The other issue I’m having now is that I can’t figure out why I should choose a specific value for the first parameter in
learn.fit_one_cycle(5, slice(1e-2)). Why 4 or 5 or some other value? Also, the error rate in the last epoch for that is 0.25. Subsequently running
learn.fit_one_cycle(10, max_lr=slice(1e-3, 1e-2))
Is there any way to pick a good number of epochs? Why 10 and not 20 or 100?
your’e in luck. lesson 2 talks about picking the number of epochs at length !
i also find this cheat sheet from stanford a handy reference at times
A specific number of epochs is not that important when you are using
fit_one_cycle because of the shape of learning rate function (look here)
It’s decreasing in later epochs, so few more epochs will not have a big impact and should not cause overfitting.
Ofc if you choose a very high number (eg. 100), the network will convergence very slow.
And for a too low number, the network will not be able to learn. Something between 5-20 should be ok.
If you think about it more, the right num of epochs depends on the size of the network, batch size, regularization, learning rate and other hyper-params. I didn’t saw a specific prescription
There are two handy callbacks in fastai EarlyStopping and ReduceLROnPlateau to increase safely number of epochs. But I think they work better with
fit function (with constant lr).
I do something like this:
fit with previous epoch lr (taken from
learn.opt.lr), with 2 times more epochs, and both callbacks
First step has a huge impact on loss.
The third step decreases loss only slightly
I think that perhaps you need more data, IMO your model is overfitting to the few training instances you have quite quickly, that’s why you went from a 18% error rate all the way down to a 50% error rate which is basically just flipping a coin to choose between the two artists.
Forgive me if I’m wrong but isn’t fit_one_cycle supposed to never overfit?
Never overfit? I believe that you have the wrong idea.
Heres what I know please do correct me if I’m wrong.
From Jeremy what I got was that we use fit_one_cycle so that we do not have to tune the hyperparameters.
At least that is how I understood it. If that is not the case then what is the benifit of fit_one_cycle? Sorry if this question is too stupid.
Sylvain wrote an article on it, perhaps you’ll find this enlightening.
Gee Thanks. I got it comletly wrong.