Difference between "open_image" and "plt.imread" methods

I used a pretrained Resnet34 model to classify images of 10 species of monkeys that I obtained from a Kaggle Dataset
Even though I achieved an accuracy of 99.6% on the validation (272 images), I still got incorrect labels for every single image that I predicted individually.

print(df[df.Label==data.classes[np.argmax(preds)]]['Common Name'])

I initially used the plt.imread() method to read the numpy.array of the image. This was then transformed using val_tfms(as it normally should) and fed into the model for prediction. When I replaced that with fast ai’s open_image() function instead…


I got every label right. So what’s the difference between the two?

open_image() function opens an image using OpenCV given the file path and returns the image in RGB format as numpy array of floats NORMALIZED to range between 0.0 - 1.0.

plt.imread() DOES NOT normalize the image.


Ahh I see, so the model was trained with normalized images. Is this a necessary preprocessing step for the specific model (Resnet34) I’m using, or just to train the last few layers? Am I to understand that image normalization is used to train all or most of the pre-trained models in Fast AI?
Please ask me to be clear in my questions if I’m not.

Good to know that I answered your question.

It’s not specific to any model or a specific case like yours where you fine-tune the last few layers. It’s a common technique (or if you will, good practice). We do this before we hand-over the images to the image transformation/augmentation pipeline (tfms).

It depends on the how the original pre-trained model handle the images in the datasets.

I think you got confused between single image normalization and dataset normalization. I have explained what is single image normalization previously (the one where you invoke open_image).

Dataset normalization is where you have the statistics of the training image datasets given by the model (e.g pre-trained ImageNet, Inception model, etc). Transformers are constructed according to that. All these are usually happened behind the scene in fastai lib. To illustrate, let’s take an example. This is how things looks like in code:

# Statistics pertaining to image data from ImageNet: mean and standard deviation of the images of each color channel
imagenet_stats = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
imagenet_stats_arr = np.array(imagenet_stats[0]) if len(imagenet_stats)==1 else [np.array(o) for o in imagenet_stats]

# Datasets normalization.
tfms_from_model(function_model=vgg16, norm_y=True, ...)
# The above function will internally call `tfms_from_stats()`
tfms_from_stats(imagenet_stats, norm_y=True)
# Then, the above function will call PyTorch Normalize class
tfm_norm = Normalize(*imagenet_stats, tfm_y=tfm_y if norm_y else TfmType.NO)
# Normalize class mormalizes an image to zero mean and unit standard deviation, given the mean and std of the original image
1 Like

I get it now, thanks!

But I think the only question remaining is what drove the developers to write the functions this way. Why is it necessary to normalize the read image to the range between 0.0 - 1.0 (Single Normalization) when calling open_image, and then re-normalize it with the given statistics of the specific model using val_tfms. Why carry it out as two separate operations? You said fast ai does all of this under then hood, so why not just read an image from file and normalize it according to the training datasets using just one function? (in this case val_tfsm would be appropriate enough). Excuse my verbosity!

A good explanation without going so much into the theories, what motivated this, according to this StackOverflow answer:

It’s simply a case of getting all your data on the same scale: if the scales for different features are wildly different, this can have a knock-on effect on your ability to learn (depending on what methods you’re using to do it). Ensuring standardised feature values implicitly weights all features equally in their representation.

No, I think you misunderstand. My question was why there were two normalization functions developed when only one (val_tfms) would have sufficed. Not expecting an answer to this, but anything would do from you :smiley:

This is a nice summary.

This is a little dated. open_image now uses PIL and pushes the PIL to a tensor for GPU and pytorch.

Nitpicking also, I think scaled from 0 to 1 is a better description than normalised.

1 Like