How to check in reasonable time that our model has improved?

I was wondering, if you want to fine tune model to get better results for example for kaggle what do you do to run the model fast to check if there is any improvements? I know we can divide the dataset, but would the improvement carry to the full dataset? from like 200 pictures dataset transfer to 10000? Running model every time on full dataset when we do a very minor change seems like a waste of time and GPU.

Does anyone have good heuristics for this?

Running on a sample is a nice way to go about this. How big the sample should be depends - it is best to check. You pick different sample sizes, run a couple of models on each and the full validation set and compare. Once your results on the sample start being meaningful to the extent that you need them, that is your sample size. It all depends on how precise you want your results to be - if they need to be within 1% or 0.01% to what they would be on the entire val set.

In general, cats and dogs is not a very big dataset and I think the inference should not take that long. One trick I use - if I have enough GPU RAM, is having a test dataloader with much bigger batch sizes. This can help cut down on the time it takes to validate results on the entire set.

Ah if you change the model even a little bit your model still might need to learn quite a lot. The precompute trick is really nice if you don’t need to train the lower, convolutional parts of the model.

1 Like

So the more precise you want results to be the bigger the sample? Do you have a specific starting point for sample size? Like 100 files or 1% of full dataset?

Yes, you are correct!

This will depend on the dataset. dogsvscats is not very complex - the object we are after occupies most of the image and there are only 2 classes. My guess that even 200 images should be okay.

Apologies if I misread your original question, now that I think about this! I thought that you might be asking about the quickest way we can get away with to tell whether our model is improving or not - this can be done via what I outline above, which is taking a really small sample of the validation set. It works for toy problems (which is what I absolutely adore) and unusual scenarios. In general however, if you care about the generalizability of your training results, you would want to follow the train - val - test split. In such a case for the dogsvscats you might want to take 10% - 20% of images aside from the train set and use them as your validation set. You would train on the 80% of data and evaluate performance on the unseen data during training, the 20% you set aside.

There is a superb article on this by fastdotai founding researcher, Rachel, that you can find here. It is extremely comprehensive and covers crucial aspects to practitioners that usually get overlooked in the literature! Well worth a (couple) of reads :wink:

1 Like

I’ve used cats and dogs just as a general example. I’ve meant general improvements of model not generalization per se (as far as I understand). Like if I change this parameter from x to y does it cause meaningful change in results. Thank you for this quite detailed explanation :slight_smile: I will surely read Rachels post.

1 Like