Resnet 18 batch size issue

tpbadger · October 6, 2020, 5:04pm

Hi all,

Trying to build a model that recognizes faces of my family (3 people). Currently have a dataset of 33 faces which breaks into a training set of 28 images. When i train the model using cnn learner and fine tune the train loss is nan. Bit confused on what is going on here? Is my dataset just too small? Have changed the batch size to 4 and that makes no difference.

Think its worth mentioning that im new to all this so might have to explain what im doing wrong in a way you’d talk to a small child!

Thanks all in advance!

Tom

GoofyMango · October 7, 2020, 3:44am

Hi Tom,

It looks to me like your model might not be training at all! That would explain why train_loss is always NaN, and why the validation_loss and error_rate aren’t changing at all.

Normally, as your model trains, its validation loss and error rate will go down as it gets better at classifying faces. Since validation loss is not changing at all, that leads me to believe the model is not changing its predictions at all. The model wouldn’t be changing its predictions if it wasn’t training. This, combined with the fact that your training loss is NaN, makes me think that there is no training data in your dls.

One thing you could do is try printing the results of

len(dls.train_ds), len(dls.valid_ds)

That should tell you how images are in your training and validation sets. If it returns 0 for the training set, then you’ll have found your problem!

Hope that helps, let me know if you have any other questions!

akashgshastri · October 7, 2020, 5:12am

Did you make a datablock? Try this, it’ll give you a rundown of what your data looks like.

robmel · July 19, 2021, 8:19pm

I actually have the same issue and changing the batch size did not help. Running it on Colab.

dhruv.metha · July 22, 2021, 9:28am

Hey @robmel, can you share the colab link?