Choosing the right backbone size resnet18, resnet34, resnet50

deepdive · February 8, 2023, 5:30am

I am currently working with a dataset which takes 4-5 hours per epoch to train and I am constrained on the number of GPUs(4 right now). I wanted to know if there was a systematic way to evaluate which backbone would be the best option for the problem without training for too many epochs.
Is looking at which one leads to the best validation loss drop in the first few epochs a good starting point.

ForBo7 · February 8, 2023, 6:13am

You could also try training on a smaller subset of the dataset for initial testing purposes. You can scale up after figuring out which one to go with.

deepdive · February 8, 2023, 5:17pm

Thank you @ForBo7, I’ll try that then. I had assumed that a larger data size meant that the network could learn more parameters, but I’ll try have to test and see which one seems promising on a smaller dataset.

sahilharidas · February 9, 2023, 7:32am

You could also train on all your data but use a smaller model initially. If you’re happy with its performance you can switch to a larger model in that family. This notebook might shows the accuracy vs speed of various timm models.

lucasvw · February 9, 2023, 9:03am

As far as I know, Jeremy’s approach to this is:

In the beginning you want to iterate fast, so start with the small versions of the models you want to try, e.g. for resnet: use resnet18. On that model, try out what works well and what doesn’t, learning rates, tta, transformations, resizes, progressive resizing whatever. Also compare the performance of that model with other “small models” such as convnext_small, vit_small etc.

Once you have found a model (or an ensemble of models) that work well, simply run the exact same thing but then with the larger version of the same model (e.g. resnet50 instead of 18, convext_large instead of convnext_small etc)

ForBo7 · February 9, 2023, 10:48am

More data doesn’t necessarily mean more parameters. Larger models have more parameters, and because they have more parameters, they require more computation. So as others are also saying here, try smaller models first. But if that’s taking too long, use a smaller portion of the dataset. And if you’re using images, before taking a smaller sample of the dataset, try resizing the images, so they’re smaller. Smaller images means fewer inputs.

deepdive · February 13, 2023, 5:10pm

Thank you @sahilharidas , @ForBo7 . @lucasvw I like the idea of tinkering with the small model in the family of models. And then just scaling and using the larger model. I’ll give that a shot. Right now I’ll use the smallest model and a smaller subset to validate and then move on to larger models and the complete dataset.