Smaller Dataset Larger Learning Rate

For Lesson 1 I uploaded my own images of lions and bears - 12 images for training and 8 images for validation for each class. I used 3 epochs and kept the learning rate at 0.01. The predictions were just above random guessing. When reviewing the results I noticed that attributes of the images like color and teeth might have been confused by the model. For example, the dark fur on the lions belly could have been mistaken for the dark fur on a bear. This was just a guess.

When I increased the learning rate to 0.1, the predictions improved to 87% accuracy. Increasing the learning rate to 0.15 proved to be the best with 100%. So, it seems like larger steps proved to work best for this smaller dataset.

I’m wondering why?

I suspect the mechanism is different. Even with a batch size of 1, with 3 epochs, the model would have very few opportunities to learn. Hence the more learning it can do at every opportunity it gets, the better off it is going to be in the end.

This sounds like a fun project. You got 100% on the validation set or the test set? Would you be willing to share the pictures? I wonder how similar within classes to each other they are.

Probably to explore the relationship between the learning rate and how the model learns moving to a bigger dataset might be a good idea. If you’d rather stick to smaller datasets (I certainly am a big fan of smaller datasets / models) I am getting a lot of mileage out of CIFAR-10. Jeremy outlines how it can be used with the fastai library in lec #7. Might be something worth taking a look at when you have a sec.

Why did you start with a lr of .01?

Did you run

m.lr_find()
1 Like

I did but the graphs were empty so I relied on trial and error.

FYI: Empty plots may be caused by a batch size that is too small.

I encountered this once when doing some image classification work on my laptop (using a 960M card) and a small batch size because of my GPU. Moved to AWS, ran again with reasonable batch size, and m.lr_find() worked again.

I assumed it had something to do with the size of the dataset; 40 images.

Try creating 50 copies of your 40 images (you’ll have 2000 images).

Then try a small batch size (like 8) and see if lr_find() works … follow that test with a batch size of 64 and see if that makes a difference.

Wouldn’t this cause overfitting?

Not necessarily, assuming you are are shuffling and also incorporating augmentation into your model training.

Do you have any other recommendations for situations where getting more data isn’t an option?