Best practice for inference (production) - normalize and resize?

I’m curious to know what the best practices are related to production usage of models?

Specifically, I had assumed that for production time (inference only) we should:

a - resize to the same image size as the model was trained.
b - normalize the incoming images (same as how the model trained)

However, yesterday in hand feeding some images to a production model I forgot to do the above and results were both accurate and high confidence. The images were about 30% larger than it had trained on.

Realizing my mistake I went back and set them up as a test batch, which then auto resized and normalized - results were the same in terms of prediction accuracy but interestingly, in some cases the confidence dropped vs using the ‘larger than trained on’ images (the typical image it went up a tiny bit, but one the one it was least confident about in the larger image, confidence dropped 10% on the trained size image).

I also note that in lesson 2 of last years course, where the bear predictor is used, it looks like a bear image is just randomly pushed in - without normalizing or resizing.

Thus, what is the best setup for running models in production - always resize and normalize or can we just use equal or larger images in terms of trained size and no need to normalize?

I have some models going into production and would really like to confirm the best practices here.

Normalizing is pretty much a must if it was trained on normalized data.

Resizing isn’t always necessary on the other hand and because of more detail it might even help not to resize. In production resizing them can make the computation easier.

The order should be different if handling loads of images. Resize first so we have to do less computation when normalizing.

1 Like

Thanks @Hadus!

Good point re: order - I’ve edited my post to put resizing as first step :slight_smile:

1 Like

It is interesting that your model was able to classify them well without normalization too. Maybe that just means that the non normalized data has a very close mean and std to the normalized data (or that the values vary a lot).

Could you tell us the mean and std of the non-normalized test data that you predicted with?