Basic questions about lesson 1

deep-johnny · June 17, 2019, 5:28am

What I’m trying to do is to create a CNN to distinguish between audio clips from Pearl Jam and The National. The first thing I did was download 23 songs by Pearl Jam and 24 by The National, extract a 10 second clip (starting at 00:30) from each one and generating a spectrogram for each clip (standing on the shoulders of giants here). Then I placed 16 images in the training dataset for each band and the rest in the validation dataset. Now, some things I didn’t expect or don’t understand.

First, running this code

learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.lr_find()
learn.recorder.plot()

Repeatedly yields different results despite having the same seed.

lr-run2 lr-run1

Second, running

lr=1e-2
learn.fit_one_cycle(8, slice(lr))

Also yields different results when run repeatedly. I assume this is because I’m not starting off the model from scratch, but then is there any way to get rid of all the learning and start over without recreating the learn object?

ozipete · June 17, 2019, 7:19am

there’s some pytorch random seed goodies under the hood as well as numpy. try adding the following where you’re setting the seed… and don’t forget to recreate your databunch when you recreate the model. there’s seed dependencies in there too. good luck mate.

torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(42)

deep-johnny · June 18, 2019, 2:46am

Thanks! That seems to work. Although I find it strange that it wasn’t mentioned in the lecture. If I don’t use your code the result are not repeatable.

The other issue I’m having now is that I can’t figure out why I should choose a specific value for the first parameter in learn.fit_one_cycle(5, slice(1e-2)). Why 4 or 5 or some other value? Also, the error rate in the last epoch for that is 0.25. Subsequently running

learn.unfreeze()
learn.fit_one_cycle(10, max_lr=slice(1e-3, 1e-2))

Gives me:

epoch	train_loss	valid_loss	error_rate	time
0	0.101125	0.654265	0.187500	00:00
1	0.202470	2.742486	0.500000	00:00
2	0.779277	1.481577	0.375000	00:00
3	0.940830	4.350643	0.375000	00:00
4	1.469711	69.141212	0.250000	00:00
5	1.325877	748.460144	0.500000	00:00
6	1.145573	933.605164	0.500000	00:00
7	1.075393	671.214417	0.500000	00:00
8	0.962771	347.121521	0.500000	00:00
9	0.878943	171.454285	0.500000	00:00

Is there any way to pick a good number of epochs? Why 10 and not 20 or 100?

ozipete · June 18, 2019, 4:54am

your’e in luck. lesson 2 talks about picking the number of epochs at length !

i also find this cheat sheet from stanford a handy reference at times
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

Kornel · June 18, 2019, 12:22pm

A specific number of epochs is not that important when you are using fit_one_cycle because of the shape of learning rate function (look here)
It’s decreasing in later epochs, so few more epochs will not have a big impact and should not cause overfitting.

Ofc if you choose a very high number (eg. 100), the network will convergence very slow.
And for a too low number, the network will not be able to learn. Something between 5-20 should be ok.

If you think about it more, the right num of epochs depends on the size of the network, batch size, regularization, learning rate and other hyper-params. I didn’t saw a specific prescription

There are two handy callbacks in fastai EarlyStopping and ReduceLROnPlateau to increase safely number of epochs. But I think they work better with fit function (with constant lr).

I do something like this:

fit_one_cycle freezed
fit_one_cycle unfreezed
fit with previous epoch lr (taken from learn.opt.lr), with 2 times more epochs, and both callbacks

First step has a huge impact on loss.
The third step decreases loss only slightly

dreambeats · June 18, 2019, 1:04pm

I think that perhaps you need more data, IMO your model is overfitting to the few training instances you have quite quickly, that’s why you went from a 18% error rate all the way down to a 50% error rate which is basically just flipping a coin to choose between the two artists.

Lord_jayam · June 18, 2019, 2:31pm

Forgive me if I’m wrong but isn’t fit_one_cycle supposed to never overfit?

dreambeats · June 18, 2019, 2:39pm

Never overfit? I believe that you have the wrong idea.

Lord_jayam · June 18, 2019, 2:52pm

Heres what I know please do correct me if I’m wrong.

From Jeremy what I got was that we use fit_one_cycle so that we do not have to tune the hyperparameters.

At least that is how I understood it. If that is not the case then what is the benifit of fit_one_cycle? Sorry if this question is too stupid.

dreambeats · June 18, 2019, 2:57pm

Sylvain wrote an article on it, perhaps you’ll find this enlightening.

Lord_jayam · June 18, 2019, 3:10pm

Gee Thanks. I got it comletly wrong.