Cnn_learner is not learning (losses aren't changing)

alter718 · October 6, 2020, 5:58pm

Hi all. I’m trying to run my own version of the first exercise and am running in to the following problem.
I can input my data in an ImageDataLoader but when I run the cnn_learner on that data the losses don;t change and my training takes 1 sec. Any thoughts? I’ve included the code below to help with diagnosing the problem.

path = ‘…/input/parrotfish/parrotfish’
fish = ImageDataLoaders.from_folder(path, valid_pct=0.2, seed=3, item_tfms = Resize(224))
fish.valid_ds.items[:3]
fish.train_ds.items[:3]
len(fish.valid_ds), len(fish.train_ds)

(this returns lengths of 13,54 so the images are recognized).

fish.valid.show_batch(max_n=3, nrows=1) - just checking to see if images are showing up. So far so good.

learn = cnn_learner(fish, resnet34, metrics=error_rate)
learn.fine_tune(1)

And this happens:

epoch	train_loss	valid_loss	error_rate	time
0	nan	0.000000	0.000000	00:01
epoch	train_loss	valid_loss	error_rate	time
0	nan	0.000000	0.000000	00:01

Sorry for the formatting. Any thoughts out there? Thanks.

GoofyMango · October 7, 2020, 8:04pm

Hmm, I am not really sure, but one idea worth checking out would be printing the results of

len(fish.train), len(fish.valid)

That will show the number of batches in the train and valid DataLoaders. I’m interested in that because the default batch size is 64, and there’s less than 64 images in your train and valid datasets, so maybe it’s not forming batches correctly. Another thing you could try would be passing in a smaller batch size to ImageDataLoaders, like this:

ImageDataLoaders.from_folder(path, bs=4, valid_pct=0.2, seed=3, item_tfms=Resize(224))

Also, just so you know, you can format code in posts by putting 3 backticks around codeblocks, like this:

alter718 · October 7, 2020, 11:26pm

Thanks. I figured it out. My training labels weren’t loading properly so there was literally nothing to learn. Thanks in any case.